Effective design of parallel matrix multiplication algorithms relies on the consideration of many interdependent issues based on the underlying parallel machine or network upon which such algorithms will be implemente...
详细信息
Effective design of parallel matrix multiplication algorithms relies on the consideration of many interdependent issues based on the underlying parallel machine or network upon which such algorithms will be implemented, as well as, the type of methodology utilized by an algorithm. In this paper, we determine the parallel complexity of multiplying two (not necessarily square) matrices on parallel distributed-memory machines and/or networks. In other words, we provided an achievable parallel run-time that can not be beaten by any algorithm (known or unknown) for solving this problem. In addition, any algorithm that claims to be optimal must attain this run-time. In order to obtain results that are general and useful throughout a span of machines, we base our results on the well-known LogP model. Furthermore, three important criteria must be considered in order to determine the running time of a parallel algorithm;namely, (i) local computational tasks, (ii) the initial data layout, and (iii) the communication schedule. We provide optimality results by first proving general lower bounds on parallel run-time. These lower bounds lead to significant insights on (i)-(iii) above. In particular, we present what types of data layouts and communication schedules are needed in order to obtain optimal run-times. We prove that no one data layout can achieve optimal running times for all cases. Instead, optimal layouts depend on the dimensions of each matrix, and on the number of processors. Lastly, optimal algorithms are provided.
The size of a time step is important for numerical weather prediction models (NWP) since forecasts need to be available within the fraction of time that may considered to be valid. However, time step size is often lim...
详细信息
ISBN:
(纸本)0769520693
The size of a time step is important for numerical weather prediction models (NWP) since forecasts need to be available within the fraction of time that may considered to be valid. However, time step size is often limited by the numerical stability of the used advection schemes. Available schemes include semi-implicit Eulerian and semi-Lagrangian schemes. In principal, semi-Lagrangian formulations result in irregular communications on parallel architectures. In this paper we describe automatic code generation for a semi-implicit scheme with a semi-Lagrangian formulation. We describe how code can be generated from a mathematical specification of the advection model and we show results from preliminary experiments we have conducted with the generated code and the reference code from a production NWP on a number of different architectures.
Effective design of parallel matrix multiplication algorithms relies on the consideration of many interdependent issues based on the underlying parallel machine or network upon which such algorithms will be implemente...
详细信息
Effective design of parallel matrix multiplication algorithms relies on the consideration of many interdependent issues based on the underlying parallel machine or network upon which such algorithms will be implemented, as well as, the type of methodology utilized by an algorithm. In this paper, we determine the parallel complexity of multiplying two (not necessarily square) matrices on parallel distributed-memory machines and/or networks. In other words, we provided an achievable parallel run-time that can not be beaten by any algorithm (known or unknown) for solving this problem. In addition, any algorithm that claims to be optimal must attain this run-time. In order to obtain results that are general and useful throughout a span of machines, we base our results on the well-known LogP model. Furthermore, three important criteria must be considered in order to determine the running time of a parallel algorithm;namely, (i) local computational tasks, (ii) the initial data layout, and (iii) the communication schedule. We provide optimality results by first proving general lower bounds on parallel run-time. These lower bounds lead to significant insights on (i)-(iii) above. In particular, we present what types of data layouts and communication schedules are needed in order to obtain optimal run-times. We prove that no one data layout can achieve optimal running times for all cases. Instead, optimal layouts depend on the dimensions of each matrix, and on the number of processors. Lastly, optimal algorithms are provided.
In recent papers, the problem of estimating the thickness and the optical constants (refractive index and absorption coefficient) of thin films using only transmittance data has been addressed by means of optimization...
详细信息
In recent papers, the problem of estimating the thickness and the optical constants (refractive index and absorption coefficient) of thin films using only transmittance data has been addressed by means of optimization techniques. Models were proposed for solving this problem using linearly constrained optimization and unconstrained optimization. However, the optical parameters of "very thin" films could not be recovered with methods that are successful in other situations. Here we introduce an optimization technique that seems to be efficient for recovering the parameters of very thin films. (C) 2003 IMACS. Published by Elsevier B.V. All rights reserved.
The article describes a numerical method of optimal control computation based on the idea of reduction the original infinite-dimensional problem into finite dimension using systems of base functions. The method can be...
详细信息
The article describes a numerical method of optimal control computation based on the idea of reduction the original infinite-dimensional problem into finite dimension using systems of base functions. The method can be considered as a generalized discretization. Although many kinds of base systems can be used, a special class of compact support base systems provide a fast way of determining gradient of the criteria functional which can be used with advantage in optimization.
The discrete-time optimal projection equations, which constitute necessary conditions fir optimal reduced-order LQG compensation, are strengthened. For the class of minimal stabilizing compensators the strengthened di...
详细信息
The discrete-time optimal projection equations, which constitute necessary conditions fir optimal reduced-order LQG compensation, are strengthened. For the class of minimal stabilizing compensators the strengthened discrete-time optimal projection equations are proved to he equivalent to first-order necessary optimality conditions for optimal reduced-order LQG compensation. The conventional discrete-time optimal projection equations are pro red to be weaker. As a result solutions of the conventional discrete-time optimal projection equations may not correspond to optimal reduced-order compensators. Through numerical examples it is demonstrated that, in fact, many solutions exist that do not correspond to optimal reduced-order compensators. To compute optimal reduced-order compensators two licit, algorithms are proposed. One is a homotopy algorithm and one is based oil iteration of the strengthened discrete-time optimal projection equations. The latter algorithm is a generalization of the algorithm that solves the two Riccati equations of full-order LQG control through iteration and therefore is highly efficient. Using different initializations of the iterative algorithm it is demonstrated that the reduced-order optimal LQG compensation problem, in general, may possess multiple extrema. Through two computer experiments it is demonstrated that the homotopy algorithm often, but not always, finds the global minimum.
A mathematical formulation of a realistic optimal greenhouse open-loop control problem for achieving a specified lettuce nitrate concentration is expounded. Like many complex agricultural and (bio)chemical optimal con...
详细信息
A mathematical formulation of a realistic optimal greenhouse open-loop control problem for achieving a specified lettuce nitrate concentration is expounded. Like many complex agricultural and (bio)chemical optimal control problems, this problem consisted of non-linear differential algebraic equations, affected by high frequency changing uncontrollable inputs, possibly conflicting path constraints and end-constraints. By reformulating the problem into a problem with isoperimetric constraints in order to deal properly with the path constraints it was cast in a form suitable for a numerical solution with the ACW-gradient optimisation. The optimisation was successful, as is shown in an example.
A nonlinear optimization-based identification procedure for fully parameterized multivariable state-space models is presented. The method can be used to identify linear time-invariant, linear parameter-varying, compos...
详细信息
A nonlinear optimization-based identification procedure for fully parameterized multivariable state-space models is presented. The method can be used to identify linear time-invariant, linear parameter-varying, composite local linear, bilinear. Hammerstein and Wiener systems. The nonuniqueness of the full parameterization is dealt with by a projected gradient search to solve the nonlinear optimization problem. Both white and nonwhite measurement noise at the output can be dealt with in a maximum likelihood setting. It is proposed to use subspace identification methods to initialize the nonlinear optimization problem. A computationally efficient and numerically reliable implementation of the procedure is discussed in detail.
The present paper is concerned with the numerical modelling of the large elastic plastic deformation behavior and localization prediction of ductile metals which are sensitive to hydrostatic stress and anisotropically...
详细信息
The present paper is concerned with the numerical modelling of the large elastic plastic deformation behavior and localization prediction of ductile metals which are sensitive to hydrostatic stress and anisotropically damaged. The model is based on a generalized macroscopic theory within the framework of nonlinear continuum damage mechanics. The formulation relies on a multiplicative decomposition of the metric transformation tensor into elastic and damaged-plastic parts. Furthermore, undamaged configurations are introduced which are related to the damaged configurations via associated metric transformations which allow for the interpretation as damage tensors. Strain rates are shown to be additively decomposed into elastic, plastic and damage strain rate tensors. Moreover, based on the standard dissipative material approach the constitutive framework is completed by different stress tensors, a yield criterion and a separate damage condition as well as corresponding potential functions. The evolution laws for plastic and damage strain rates are discussed in some detail. Estimates of the stress and strain histories are obtained via an explicit integration procedure which employs an inelastic (damage-plastic) predictor followed by an elastic corrector step. numerical simulations of the elastic-plastic deformation behavior of damaged solids demonstrate the efficiency of the formulation. A variety of large strain elastic-plastic-damage problems including severe localization is presented, and the influence of different model parameters on the deformation and localization prediction of ductile metals is discussed. (C) 2002 Elsevier Science Ltd. All rights reserved.
作者:
Oran, ESUSN
Res Lab Lab Computat Phys & Fluid Dynam Washington DC 20375 USA
The evolution of the science and art of numerical simulation of complex, complicated fluid flows has made enormous strides in the past 40 years. We have progressed from relatively simplified one-dimensional steady-sta...
详细信息
The evolution of the science and art of numerical simulation of complex, complicated fluid flows has made enormous strides in the past 40 years. We have progressed from relatively simplified one-dimensional steady-state results to fully three-dimensional, time-dependent simulations including very complex physics. These advances have been driven by new computational hardware, new algorithms for solving the equations, and the real need for this technology. The broad range of applications that are possible are emphasized and some of what we can now do, what we have learned, and where we might go with this exciting technology in the future is described.
暂无评论