In this work, we assess the performance and energy efficiency of high-performance codes for the convolution operator, based on the direct, explicit/implicit lowering and Winograd algorithms used for deep learning (DL)...
详细信息
In this work, we assess the performance and energy efficiency of high-performance codes for the convolution operator, based on the direct, explicit/implicit lowering and Winograd algorithms used for deep learning (DL) inference on a series of ARM-based processor architectures. Specifically, we evaluate the NVIDIA Denver2 and Carmel processors, as well as the ARM Cortex-A57 and Cortex-A78AE CPUs as part of a recent set of NVIDIA Jetson platforms. The performance-energy evaluation is carried out using the ResNet-50 v1.5 convolutional neural network (CNN) on varying configurations of convolution algorithms, number of threads/cores, and operating frequencies on the tested processor cores. The results demonstrate that the best throughput is obtained on all platforms with the Winograd convolution operator running on all the cores at their highest frequency. However, if the goal is to reduce the energy footprint, there is no rule of thumb for the optimal configuration.
In this paper, a new method for fast estimation of the characteristics of single-inductance dc-dc converters at the steady state with self-heating taken into account is proposed. This method is based on the special me...
详细信息
In this paper, a new method for fast estimation of the characteristics of single-inductance dc-dc converters at the steady state with self-heating taken into account is proposed. This method is based on the special memoryless convolution algorithm. The method is described in detail. The theoretical considerations are illustrated with simulation results of buck and boost converters.
This paper deals with three kinds of convolution algorithms designed for the thermal analysis of semiconductor devices and electronic circuits with the use of the lumped thermal models. Such kind of models and algorit...
详细信息
This paper deals with three kinds of convolution algorithms designed for the thermal analysis of semiconductor devices and electronic circuits with the use of the lumped thermal models. Such kind of models and algorithms is especially convenient to analyse the electronic circuits consisting of a high number of thermally sensitive devices. The fundamental features, such as: stability, convergence and accuracy of these algorithms, are considered and investigated in the paper. In the investigations the exponential test function describing the dissipated power is taken into account. It was analytically proved that the considered algorithms are stable and convergent. The analytical formulas describing the values of the local and total cut off error of the algorithms are proposed. The theoretical considerations are accompanied by some calculation results illustrating the influence of the values of thermal parameter models and the size of the analysis step on the accuracy of calculations carried out with the considered algorithms. (c) 2006 Elsevier Inc. All rights reserved.
Since early 2007 a new version of the Anisotropic Analytical Algorithm (AAA) for photon dose calculations was released by Varian Medical Systems for clinical usage on Elekta linacs and also, with some restrictions, fo...
详细信息
Since early 2007 a new version of the Anisotropic Analytical Algorithm (AAA) for photon dose calculations was released by Varian Medical Systems for clinical usage on Elekta linacs and also, with some restrictions, for Siemens linaes. Basic validation studies were peformed and reported for three beams: 4,6 and 15 MV for an Elekta Synergy, 6 and 15 MV for a Siemens Primus and, as a reference, for 6 and 15 MV from a Varian Clinac 2100C/D. Generally AAA calculations reproduced well measured data and small deviations were observed for open and wedged fields. PDD curves showed in average differences between calculation and measurement smaller than 1% or 1.2 mm for Elekta beams, 1% or 1.8 mm for Siemens beams and 1% or I mm for Varian beams. Profiles in the flattened region matched measurements with deviations smaller than 1% for Elekta and Varian beams, 2% for Siemens. Percentage differences in Output Factors were observed as small as 1% in average.
This paper surveys algorithms for computing linear and cyclic convolution. algorithms are presented in a uniform mathematical notation that allows automatic derivation, optimization, and implementation. Using the tens...
详细信息
This paper surveys algorithms for computing linear and cyclic convolution. algorithms are presented in a uniform mathematical notation that allows automatic derivation, optimization, and implementation. Using the tensor product and Chinese remainder theorem, a space of algorithms is defined and the task of finding the best algorithm is turned into an optimization problem over this space of algorithms. This formulation led to the discovery of new algorithms with reduced operation count. Symbolic tools are presented for deriving and implementing algorithms. (C) 2003 Elsevier Ltd. All rights reserved.
We develop a performance modeling methodology for product-form circuit-switched networks. These networks allow for: arbitrary topology and link capacities; Poisson and finite population arrivals; multiple classes of c...
详细信息
We develop a performance modeling methodology for product-form circuit-switched networks. These networks allow for: arbitrary topology and link capacities; Poisson and finite population arrivals; multiple classes of calls, each class with a different route and bandwidth requirement; conference as well as point-to-point calls. The methodology is first applied to generalized tree networks, which consist of multiple access links feeding into a common link. Each access link may support multiple ‘long-distance' classes (requiring circuits only on the access link and on the common link) and multiple ‘local' classes (requiring circuits only on the access link). For generalized tree networks an efficient algorithm is given to determine the blocking probabilities. The methodology is then applied to hierarchical tree networks, where traffic is repeatedly merged in the direction of a root *** also establish a ‘Norton' theorem for product-form circuit-switched networks. This theorem implies that for any given calling class, the entire network can be replaced by an Erlang loss system with a state-dependent arrival rate, without modifying the equilibrium probabilities for the particular calling class.
Implementation of rectangular transforms (r.t.) in modular arithmetic and computation of number theoretic transforms through Winograd's algorithm are discussed. The computational effort of various algorithms to im...
详细信息
Implementation of rectangular transforms (r.t.) in modular arithmetic and computation of number theoretic transforms through Winograd's algorithm are discussed. The computational effort of various algorithms to implement real convolution is investigated. Considering the signal/noise ratio performance and hardware complexity, it is shown that the r.t.s are best suited for digital-filtering applications with word lengths less than about 16 bits. Finally, r.t.s are shown to be the most amenable to the application of the Chinese remainder theorem for increasing the dynamic range
暂无评论