In a dedicated mixed-machine heterogeneous computing (HC) system, an application program may be decomposed into subtasks, then each subtask assigned to the machine where it is best suited for execution. Subtask data r...
详细信息
In a dedicated mixed-machine heterogeneous computing (HC) system, an application program may be decomposed into subtasks, then each subtask assigned to the machine where it is best suited for execution. Subtask data relocation is defined as selecting the sources for their needed data items. This study focuses on theoretical issues for data relocation using a stochastic HC model. It is assumed that multiple independent subtasks of an application program can be executed concurrently on different machines whenever possible. A stochastic model for HC is proposed, in which the computation times of subtasks and communication times for inter-machine data transfers can be random variables. The optimization problem for finding the optimal matching, scheduling, and data relocation schemes to minimize the total execution time of an application program is defined based on this stochastic HC model. The optimization criteria and search space for the above optimization problem are described. It is proven that a greedy algorithm based approach will generate the optimal data relocation scheme with respect to any fixed matching and scheduling schemes. This result indicates that a greedy algorithm based approach is the best strategy for developing data relocation heuristics in practice.
Motion tracking using an active camera is a very computationally complex problem. Existing serial algorithms have provided frame rates that are much lower than those desired, mainly because of the lack of computationa...
详细信息
Motion tracking using an active camera is a very computationally complex problem. Existing serial algorithms have provided frame rates that are much lower than those desired, mainly because of the lack of computational resources. parallelcomputers are well suited to image processing tasks and can provide the computational power that is required for real-time motion tracking algorithms. This paper develops a parallel implementation of a known serial motion tracking algorithm, with the goal of achieving greater than real-time frame rates, and to study the effects of data layout, choice of parallel mode of execution, and machine size on the execution time of this algorithm. A distinguishing feature of this application study is that the portion of each image frame that is relevant changes from one frame to the next based on the camera motion. This impacts the effect of the chosen data layout on the needed inter-processor data transfers and the way in which work is distributed among the processors. Experiments were performed to determine for which image sizes and number of processors which data layout would perform better. The parallelcomputers used in this study are the MasPar MP-1, Intel Paragon, and PASM. Different modes are examined and it is determined that mixed mode is faster than SIMD or MIMD implementations.
In order to generate local addresses for an array section A(l:h:s) with block-cyclic distribution, an efficient compiling method is required. In this paper, two local address generation methods for the block-cyclic di...
详细信息
ISBN:
(纸本)0780342291
In order to generate local addresses for an array section A(l:h:s) with block-cyclic distribution, an efficient compiling method is required. In this paper, two local address generation methods for the block-cyclic distribution are presented. One is a simple local address generation method that is modified from the virtual-block scheme. The other is a linear-time /spl Delta/M table construction method. The array elements of A(l:h:s) to be accessed at run-time build up a family of lines. By using the equation of the lines, a /spl Delta/M table can be generated in O(k) time. Experimental results show that a simple local address generation method has poor performance but a linear-time /spl Delta/M table generation method is faster than other algorithms in /spl Delta/M table generation time and access time for 10,000 array elements.
Heterogeneous computing covers a great variety of situations. This study focuses on a particular application domain (iterative automatic target recognition tasks) and an associated specific class of dedicated heteroge...
详细信息
Heterogeneous computing covers a great variety of situations. This study focuses on a particular application domain (iterative automatic target recognition tasks) and an associated specific class of dedicated heterogeneous hardware platforms. The contribution of this paper is that, for the computational environment considered, it presents a methodology for real-time on-line input-data dependent remappings of the application subtasks to the processors in the heterogeneous hardware platform using previously stored off-line statically determined mappings. That is, the operating system will be able to decide during the execution of the application whether or not to perform a remapping based on information generated by the application from its input data. If the decision is to remap, the operating system will be able to select a previously derived and stored mapping that is appropriate for the given state of the application (e.g., the number of objects it is currently tracking).
This is a very informal introduction to the 1996 ICPP Workshop on Challenges for parallelprocessing. This workshop is held in conjunction with the 1996 International Conference on parallelprocessing (ICPP). The purp...
详细信息
This is a study of the performance on different parallel machines of the solution to the system of linear equations that results from the finite-differencing of the neutron diffusion equation in the context of nuclear...
详细信息
Discusses the advantages of computing with heterogeneous parallel machines, and examines the research challenges for automating the use of such systems. One type of heterogeneous computing system is a mixed-mode machi...
详细信息
Discusses the advantages of computing with heterogeneous parallel machines, and examines the research challenges for automating the use of such systems. One type of heterogeneous computing system is a mixed-mode machine, where a single machine can operate in different modes of parallelism. Another is a mixed-machine system, where a suite of different kinds of parallel machines are interconnected by high-speed links. To exploit such systems, a task must be decomposed into subtasks, where each subtask is computationally homogeneous. The subtasks are then assigned to and executed with the machines (or modes) that will result in a minimal overall execution time. Typically, users must specify this decomposition and assignment. One long-term pursuit in heterogeneous computing is to do this automatically. An overview of a conceptual model of what this involves is given. As an example of the research in this area, a genetic-algorithm-based approach to the subtask assignment and scheduling problem is explored. Open problems in heterogeneous computing are described.
PASM is a concept for a parallelprocessing system that allows experimentation with different architectural design alternatives. PASM is dynamically reconfigurable along three dimensions: partitionability into indepen...
详细信息
PASM is a concept for a parallelprocessing system that allows experimentation with different architectural design alternatives. PASM is dynamically reconfigurable along three dimensions: partitionability into independent or communicating submachines, variable interprocessor connections, and mixed-mode SIMD/MIMD parallelism. With mixed-mode parallelism, a program can switch between SIMD (synchronous) and MIMD (asynchronous) parallelism at instruction-level granularity, allowing the use of both modes in a single machine. The PASM concept is presented, showing the ways in which reconfiguration can be accomplished. Trade-offs among SIMD/MIMD, and mixed-mode parallelism are explored. The small-scale PASM prototype with 16 processing elements is described. The ELP mixed-mode programming language used on the prototype is discussed. An example of a prototype-based study that demonstrates the potential of mixed-mode parallelism is given.
To assure the parallel implementation of an algorithm performs to its maximum potential, a knowledge of the specific parallel machine being used is required. Mapping gray-scale morphological operators and a filter in ...
详细信息
To assure the parallel implementation of an algorithm performs to its maximum potential, a knowledge of the specific parallel machine being used is required. Mapping gray-scale morphological operators and a filter in a SIMD, a MIMD, and a mixed-mode environment is analyzed. The matching of several algorithmic techniques and machine features are examined analytically and experimentally. Issues considered include concurrent execution of subtasks, data layout, choice of data transfer protocols, and the mode of parallelism used. Experiments are performed using the MIMD Intel Paragon, SIMD MasPar MP-I, and the mixed-mode PASM prototype. The analytical results and experimental procedures can be applied to other systems as well.
暂无评论