In order to generate local addresses for an array section A(l:h:s) with block-cyclic distribution, an efficient compiling method is required. In this paper, two local address generation methods for the block-cyclic di...
详细信息
ISBN:
(纸本)0780342291
In order to generate local addresses for an array section A(l:h:s) with block-cyclic distribution, an efficient compiling method is required. In this paper, two local address generation methods for the block-cyclic distribution are presented. One is a simple local address generation method that is modified from the virtual-block scheme. The other is a linear-time /spl Delta/M table construction method. The array elements of A(l:h:s) to be accessed at run-time build up a family of lines. By using the equation of the lines, a /spl Delta/M table can be generated in O(k) time. Experimental results show that a simple local address generation method has poor performance but a linear-time /spl Delta/M table generation method is faster than other algorithms in /spl Delta/M table generation time and access time for 10,000 array elements.
The CAPSE environment for computer Aided parallel Software engineering is intended to assist the developer in the crucial task of parallel programming. The methodology of CAPSE is based on direct manipulative graphica...
详细信息
The CAPSE environment for computer Aided parallel Software engineering is intended to assist the developer in the crucial task of parallel programming. The methodology of CAPSE is based on direct manipulative graphical creation and editing of scalable workload characterizations of MIMD algorithms. This paper presents the basic concepts of this methodology and an example of a parallel Poisson solver. The workload characterization representing the computation and communication behavior of the algorithm is based on directed acyclic task graphs, which achieve scalability by composing the task graph of scalable basic patterns instead of single node and arcs. The composition and the usage of these basic patterns is described in the light of designing the Poisson solver algorithm. The resulting task graph is used to predict the program's performance on a nCUBE 2 distributed memory machine and the PAPS simulator.
Adjustable speed drive systems using AC‐machines are often fed by voltage‐source inverters. This paper presents a new strategy to control the line‐side converter. State‐space and fuzzy methods are combined to achi...
作者:
Krebs, S.Dr.-Ing. Stephan Krebs (1963) received his Diploma degree in 1960 and his Dr.-Ing. degree in 1994 from the Elektrotechnisches Institut
Universitat Karlsruhe/Germany. His main fields of interest are the use of modem time-discrete control methods and parallel signal processing in adjustable speed drive and power supply systems. In summer 1994 he joined the Power Group of the Department of Electrical and Computer Engineering of the University of Toronto/Canada where he is currently working on industrial controller platform for power applications. (Department of Electrical and Computer Engineering University of Toronto 10 King's College Rd. Toronto/Ontario Canada M5S 1A4 T +4 1 6/9 78-66 18 Fax +4 1 6/9 7 1-23 25)
The cascaded doublyfed machine consists of two separate induction machines which are interconnected by their rotor windings. This paper presents a general control structure for such machines, based on the principle of...
In order to solve the speed problem and shallow reasoning problem met in current research in fault diagnosis expert system, this paper presents a model based parallel fault diagnosis expert system for energy managemen...
The problem of minimizing the execution time of programs within a heterogeneous environment is considered. Different computational characteristics within a parallel algorithm may make switching execution from one mach...
详细信息
The problem of minimizing the execution time of programs within a heterogeneous environment is considered. Different computational characteristics within a parallel algorithm may make switching execution from one mach...
详细信息
The problem of minimizing the execution time of programs within a heterogeneous environment is considered. Different computational characteristics within a parallel algorithm may make switching execution from one machine to another beneficial; however, the cost of switching between machines during the execution of a program must be considered. This cost is not constant, but depends on data transfers needed as a result of the move. Therefore, determining a minimum-cost assignment of machines to program segments is not straightforward. A previously presented block-based mode selection (BBMS) approach is used as a basis to develop a heuristic method for assigning machines to program segments of data-parallel algorithms. Simulation results of parallel program behavior using the heuristic indicate that good assignments are possible without resorting to exhaustive search techniques.< >
A framework for estimating the relative execution time of a data-parallel algorithm in an environment capable of the SIMD and SPMD (Single Program - Multiple Data) modes of computation is presented. Given a data-paral...
For real-time radar processing, it is very desirable to have an algorithm that does not assume restricted statistics of the input data and can be implemented for high-speed processing (without a high cost) to meet rea...
详细信息
For real-time radar processing, it is very desirable to have an algorithm that does not assume restricted statistics of the input data and can be implemented for high-speed processing (without a high cost) to meet real-time requirements. We therefore apply the QR decomposition-based least-squares method for linear prediction to the problem of computing the reflection coefficients of a lattice predictor, instead of using the conventional Burg algorithm. We also propose a modified one-dimensional ring architecture for implementing the QR method of least-squares. The particular application considered in this case is that of surveillance radar systems for air traffic control.< >
An analytical study of potential pathological performance areas of the Seamless architecture is presented. Seamless is a latency-tolerant, distributed memory, multiprocessor architecture. A key component of the philos...
详细信息
An analytical study of potential pathological performance areas of the Seamless architecture is presented. Seamless is a latency-tolerant, distributed memory, multiprocessor architecture. A key component of the philosophy of Seamless, however, is the use of standard, commodity components for a large part of the system. A discussion of the unavoidable implementation compromises imposed by this decision is presented, followed by a summary of some optimistic performance studies. Then an analytical study that parameterizes the predicts the worst-case impact of using standard components is provided. Finally, it is shown that these bottlenecks are manageable via careful generation of target machine code so that the optimistic performance studies become realistic expectations for a range of program behaviors and granularities.< >
暂无评论