Consider the problem of assigning N software versions of a multiversion software to M processors for execution. When a processor completes executing a software version, it sends the output to a voter immediately. The ...
详细信息
Consider the problem of assigning N software versions of a multiversion software to M processors for execution. When a processor completes executing a software version, it sends the output to a voter immediately. The voter executes a voting strategy to estimate the correct output. When ii has made a sufficiently reliable estimation (e.g., it has received [(N/2)] identical outputs under majority voting), it accepts this estimated output and terminates the execution of all the unfinished versions. Therefore, some software versions may not be executed to completion. In this paper, we analyze the mean time to reach correct consensus for four voting strategies. To minimize the mean time to reach correct consensus, we show that the processor assignment problem is NP-hard and we propose a heuristic to find suboptimal assignments. When two or more versions are assigned to a processor, these versions are executed one after the other and we derive the optimal execution sequence for them.
An important issue for the efficient use of multiprocessor systems is the assignment of parallel processors to nested parallel loops. It is desirable for a processor assignment algorithm to be fast and always generate...
详细信息
An important issue for the efficient use of multiprocessor systems is the assignment of parallel processors to nested parallel loops. It is desirable for a processor assignment algorithm to be fast and always generate an optimal processor assignment. In this paper, we propose two efficient algorithms to decide the optimal number of processors assigned to each individual loop. Efficient parallel counterparts of these two algorithms are also presented. These algorithms not only always generate an optimal processor assignment, but also are much faster than the existing optimal algorithm in the literature. Next, we try to improve the performance of parallel execution by transforming a nested parallel loop into a semantically equivalent one. Three loop transformations are investigated. The effect of these transformations on processor assignment is discussed. Experiments are conducted to show the advantage of applying these transformations. It is observed that, in most cases, the parallel execution time is improved after applying these transformations.
A technique for scheduling and processor allocation leading to the synthesis of integrated heterogeneous pipelined processing elements, implementing digital signal processing applications, is proposed. The proposed te...
详细信息
A technique for scheduling and processor allocation leading to the synthesis of integrated heterogeneous pipelined processing elements, implementing digital signal processing applications, is proposed. The proposed technique achieves efficient hardware implementations at the logic-level by minimizing the number of processing units used, without compromising the rate and delay optimality criteria. The proposed algorithm is found to outperform algorithms resulting in homogeneous implementations, as it gives schedules with lower iteration periods, requires less hardware resources, and has lower time complexity at design time. In comparison with the already existing heterogeneous algorithms, the proposed algorithm produces schedules of lower time complexity and lower iteration period for some applications. The optimal performance of the proposed algorithm has been verified on several benchmarks. (c) 2005 Elsevier Ltd. All rights reserved.
We present here an efficient systolic implementation for 3-D IIR digital filters. The systolic implementation is obtained by using an algebraic mapping technique. This new mapping technique gives us the choice to mix ...
详细信息
We present here an efficient systolic implementation for 3-D IIR digital filters. The systolic implementation is obtained by using an algebraic mapping technique. This new mapping technique gives us the choice to mix pipelined variables and broadcast variables. We also determine, through the mapping method, the buffer sizes, the direction of variables propagations and the data feeding and extracting points. The resultant systolic array implementation is a modular structure composed of 2-D filter modules connected by simple buffers. This new systolic implementation is regular, modular and amenable to VLSI implementation.
A novel probabilistic framework for the study of performance issues in multiprocessor scheduling of tasks with uncertain characteristics is proposed. It enables the determination of various performance measures, such ...
详细信息
A novel probabilistic framework for the study of performance issues in multiprocessor scheduling of tasks with uncertain characteristics is proposed. It enables the determination of various performance measures, such as the overall completion rate from a probabilistic picture of the tasks currently under execution, formed from a probabilistic description of newly arriving tasks and assignment of processors. Once optimised, these measures may be used to guide the decisions of schedulers at run time.
Partitioning of processors on a multiprocessor system involves logically dividing the system into processor partitions. Programs can be executed in the different partitions in parallel. Optimally setting the partition...
详细信息
Partitioning of processors on a multiprocessor system involves logically dividing the system into processor partitions. Programs can be executed in the different partitions in parallel. Optimally setting the partition size can significantly improve the throughput of multiprocessor systems. The speedup characteristics of parallel programs can be defined by execution signatures. The execution signature of a parallel program on a multiprocessor system is the rate at which the program executes in the absence of other programs and depends upon the number of allocated processors, the specific architecture, and the specific program implementation. Based on the execution signatures, this paper analyzes simple Markovian models of dynamic partitioning. From the analysis, when there are at most two multiprocessor partitions, the optimal dynamic partition size can be found which maximizes throughput. Compared against other partitioning schemes, the dynamic partitioning scheme is shown to be the best in terms of throughput when thereconfiguration overhead is low. If the reconfiguration overhead is high, dynamic partitioning is to be avoided. An expression for the reconfiguration overhead threshold is derived. A general iterative partitioning technique is presented. It is shown that the technique gives maximum throughput forn partions.
暂无评论