The advancement of the engine control increases the amount of computation. The production ECU (Electronic Control Unit), which is made of single-core architecture, cannot have a higher clock speed. Using multi- / many...
详细信息
The advancement of the engine control increases the amount of computation. The production ECU (Electronic Control Unit), which is made of single-core architecture, cannot have a higher clock speed. Using multi- / many-core architecture is the only way to decrease execution time. However, when implementing the engine control software, various problems occur in utilization of the multi- / many-core ECU. One of the biggest problems is sequential structure of control software because the software can only execute with one core on the multi- / many-core ECU. The purpose of this paper is to describe the parallelized control design method, which has decomposed sequential structure and decreases execution time in the embedded multi- / many-core production ECU. (C) 2016, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
Abstract The succesful application of model predictive control (MPC) in fast embedded systems relies on faster and more energy efficient ways of solving complex optimization problems. A custom quadratic programming (Q...
详细信息
Abstract The succesful application of model predictive control (MPC) in fast embedded systems relies on faster and more energy efficient ways of solving complex optimization problems. A custom quadratic programming (QP) solver implementation on a field-programmable gate array (FPGA) can provide substantial acceleration by exploiting the parallelism inherent in some optimization algorithms, apart from providing novel computational opportunities arising from deep pipelining. This paper presents a new MPC algorithm based on multiplexed MPC that can take advantage of the full potential of an existing FPGA design by utilizing the provided ‘free’ parallel computational channels arising from such pipelining. The result is greater acceleration over a conventional MPC implementation and reduced silicon usage. The FPGA implementation is shown to be approximately 200x more energy efficient than a high performance general purpose processor (GPP) for large control problems.
We investigate the parallel complexity of recognition problems for context-free and regular array (image) sets. We show that the sequential time complexity of the recognition of an n × n image is O(n 5 ). The spa...
详细信息
We investigate the parallel complexity of recognition problems for context-free and regular array (image) sets. We show that the sequential time complexity of the recognition of an n × n image is O(n 5 ). The space required for these recognition problems is O(n 5 ). We prove that there are log 2 n time parallel algorithms with BM (n 4 ) and n 2 BM (n) processors for the recognition of context-free and regular array sets, respectively, where BM (n) is the number of processors sufficient to multiply two boolean n × n matrices in logarithmic time. We develop also a methodology for processing images using composition systems.
We present a cost-optimal parallel algorithm for the maximum matching problem on bipartite permutation graphs on an EREW PRAM. Previously, Chen and Yesha have dealt with this problem. Their solution relies on Dekel an...
详细信息
We present a cost-optimal parallel algorithm for the maximum matching problem on bipartite permutation graphs on an EREW PRAM. Previously, Chen and Yesha have dealt with this problem. Their solution relies on Dekel and Sahni's matching algorithm for convex bipartite graphs, which runs in O(log2n) time usingO(n) processors. Given a permutation diagram, our algorithm runs in O(logn) time by using O(n/logn) processors. Our method starts with an easily understood greedy algorithm. We define a nontrivial binary operation which is associative and equivalent to the greedy algorithm. Thus parallel prefix can be applied to the problem.
An overview of parallel computing is provided, with reference to numerical analysis and, in particular, to computational electromagnetics. The history of parallelism is reviewed, and the general principles are provide...
详细信息
An overview of parallel computing is provided, with reference to numerical analysis and, in particular, to computational electromagnetics. The history of parallelism is reviewed, and the general principles are provided. The two main types of parallelism encountered, pipelining and replication are discussed, and an example of each is described. A parallel algorithm for forming a matrix-vector product is presented and analyzed. This is then used as the core of a parallel conjugate gradient algorithm. The theoretically predicted efficiency and the measured efficiency are compared. A glossary and a brief discussion of the available literature on parallel processing are included.< >
We have developed a mathematical model for video on demand server design based on principal component analysis. Singular value decomposition on the video correlation matrix is used to perform the PCA. The challenge is...
详细信息
We have developed a mathematical model for video on demand server design based on principal component analysis. Singular value decomposition on the video correlation matrix is used to perform the PCA. The challenge is to counter the computational complexity, which grows proportionally to n 3 , where n is the number of video streams. We present a solution from high performance computing, which splits the problem up and computes it in parallel on a distributed memory system.
This paper proposes a parallel algorithm for computing anN( = Kn) point Lagrange interpolation on fc-ary n-cube networks. The algorithm consists of three phases: initialisation, main and final. There is no computation...
详细信息
This paper proposes a parallel algorithm for computing anN( = Kn) point Lagrange interpolation on fc-ary n-cube networks. The algorithm consists of three phases: initialisation, main and final. There is no computation in the initialisation phase. The main phase is composed of N/2 steps, each consisting of four multiplications and four subtractions, and an additional step including one division and one multiplication. Communication in the main phase is based on an all-to-all broadcast algorithm on a Hamiltonian ring embedded in a k-ary n-cube. The final phase is carried out in n x ⌊k/l⌋ steps, each requiring one addition. A performance evaluation of the proposed algorithm reveals a near to optimum speedup for a typical range of sy:;tem parameters used in current state-of-the-art implementations. Our study also reveals that when implementation cost is taken into account low-dimensional K-ary n-cubes achieve better speedup than their higher-dimensional counterparts.
The adaptive BDDC method is extended to the selection of face constraints in three dimensions. A new implementation of the BDDC method is presented based on a global formulation without an explicit coarse problem, wit...
详细信息
The adaptive BDDC method is extended to the selection of face constraints in three dimensions. A new implementation of the BDDC method is presented based on a global formulation without an explicit coarse problem, with massive parallelism provided by a multifrontal solver. Constraints are implemented by a projection and sparsity of the projected operator is preserved by a generalized change of variables. The effectiveness of the method is illustrated on several engineering problems. (c) 2011 IMACS. Published by Elsevier B.V. All rights reserved.
Equations of equilibrium arise in numerous areas of engineering. Applications to electrical networks, structures, and fluid flow are elegantly described in Introduction to Applied Mathematics, Wellesley Cambridge Pres...
详细信息
Equations of equilibrium arise in numerous areas of engineering. Applications to electrical networks, structures, and fluid flow are elegantly described in Introduction to Applied Mathematics, Wellesley Cambridge Press, Wellesley, MA, 1986 by Strang. The context in which equilibrium equations arise may be stated in two forms:
We describe an alternative implementation of Atallah and Vishkin’s parallel algorithm for finding an Euler Tour of a graph. Instead of finding a spanning tree as an intermediate step, this algorithm is based on ident...
详细信息
We describe an alternative implementation of Atallah and Vishkin’s parallel algorithm for finding an Euler Tour of a graph. Instead of finding a spanning tree as an intermediate step, this algorithm is based on identifying a strut which is easier to compute. Using the strut, vertices which have more than one circuit passing through them are identified directly. Stitching at such vertices reduces the number of circuits in the Euler Partition.
暂无评论