A processor array with a reconfigurable bus system (abbreviated to PARBS) is a computation model which consists of a processor array and a reconfigurable bus system. It is a very powerful computation model in that it ...
详细信息
A processor array with a reconfigurable bus system (abbreviated to PARBS) is a computation model which consists of a processor array and a reconfigurable bus system. It is a very powerful computation model in that it possesses the ability to solve many problems efficiently. However, most existing efficient algorithms on PARBS's use a large number of processors to solve problems. For example, to determine the maximum (minimum) of n data items in O(1) time, O(n2) processors are required [12]. To solve the all-pairs shortest paths and the minimum spanning tree problems in O(log n) time, O(n4) processors are required [20]. These networks will therefore become very expensive for large n. In this paper, we introduce the concept of iterative-PARBS, which is similar to the FOR-loop construct in sequential programming languages. The iterative-PARBS is a building block through which the processing data can be routed several times. We can think of it as a ''hardware subroutine.'' Based on this scheme, it is possible to explore more cost-effective, time-efficient parallel algorithms for use in a PARBS. The following new results are derived in this study: 1) The minimum (maximum) of n data items can be determined in O(1) time on a PARBS with O(n1+epsilon) processors for any fixed epsilon > 0. 2) The all-pairs shortest paths and the minimum spanning tree problems can be solved in O(log n) time on a PARBS with O(n3+epsilon) processors for any fixed epsilon > 0.
Memory Sharing processor array (MSPA) architecture has been developed as an effective array processing architecture for both reduced data storages and increased processor cell utilization efficiency[ll. In this paper,...
详细信息
Memory Sharing processor array (MSPA) architecture has been developed as an effective array processing architecture for both reduced data storages and increased processor cell utilization efficiency[ll. In this paper, the MSPA design methodology is extended to the VLSI synthesis of a serial input processor array (PA). Then, a new bit-serial input multiplier and a new data serial input matrix multiplier are derived from the new PA. These multipliers are superior to the conventional multipliers by their smaller number of logic-gate count.
To enhance fabrication yield for processor arrays, many reconfiguration schemes for replacing faulty processing elements (PE's) with spare PE's have been proposed. An array grid model based on single-track swi...
详细信息
To enhance fabrication yield for processor arrays, many reconfiguration schemes for replacing faulty processing elements (PE's) with spare PE's have been proposed. An array grid model based on single-track switches is one of such models. For this model, some algorithms for reconfiguring processor arrays have been proposed. However, any algorithm which can reconfigure the array, whenever the array is reconfigurable, has not been proposed as yet. This paper describes reconfiguration methods of processor arrays with faulty PE's. The methods use indirect replacements for reconfiguring arrays. First, we introduce a concept of fatal fault pattern, which makes an array unreconfigurable. Then, for the reconfiguration method with fixed spare arrangement, efficient spare arrangements are given by evaluating the probability of an occurring fatal fault pattern. Further, we present reconfiguration algorithm with relocating spare. In the algorithm, fatal fault patterns are eliminated by relocating spare. Computer simulations show that the method has good performance of reconfiguration.
List ranking finds for each cell in a linked list the number of cells that precede it in the list. This paper presents a work-efficient list-ranking algorithm for fine-grained processor arrays. This algorithm runs on ...
详细信息
List ranking finds for each cell in a linked list the number of cells that precede it in the list. This paper presents a work-efficient list-ranking algorithm for fine-grained processor arrays. This algorithm runs on an array of n/ log(2) n processors with the expected run-time of O(log(2) n). As list ranking is highly communication intensive, the proposed algorithm is able to reduce communication cost among processors by assigning sublists, instead of arbitrary cells, of a linked list to each processor. The proposed algorithm is also capable of keeping all processors busy during the whole list-ranking process in order to utilize all processors efficiently. (C) 2000 Elsevier Science Inc. All rights reserved.
The paper deals with systematic methods for analyzing and designing selftimed regular arrays of processors. Methods are presented for deriving measures of efficiency and for verifying the computational behavior of a g...
详细信息
The paper deals with systematic methods for analyzing and designing selftimed regular arrays of processors. Methods are presented for deriving measures of efficiency and for verifying the computational behavior of a given array. It is shown that the optimization of a given regular selftimed processor array, with respect to its processor utilization, can be given mathematically in the form of linear programs or integer linear programming problems, whose sizes are independent of the size of the array.
In a multiprocessor array, some processing elements (PEs) fail to function normally due to hardware defects or soft faults caused by overheating, overload or occupancy by other running applications. Fault-tolerant rec...
详细信息
In a multiprocessor array, some processing elements (PEs) fail to function normally due to hardware defects or soft faults caused by overheating, overload or occupancy by other running applications. Fault-tolerant reconfiguration reorganizes fault-free PEs to a new regular topology by changing the interconnection among PEs. This paper investigates the problem of constructing as large as possible logical array with short interconnects from a physical array with faults. A flexible rerouting scheme is developed to improve the efficiency of utilizing fault-free PEs. Under the scheme, two efficient reconfiguration algorithms are proposed. The first algorithm is able to generate the maximum logical array (MLA) in linear time. The second algorithm reduces the interconnect length of the MLA, and it is capable of producing nearly optimal logical arrays in comparison to the lower bound of the interconnect length, that is also proposed in this paper. Experimental results validate the efficiency of the flexible rerouting schemes and the proposed algorithms. For 128 x 128 host arrays with 30% unavailable PEs, the proposed approaches improve existing algorithm up to 44% in terms of logical array size, while reducing the interconnection redundancy by 49.6%. In addition, the proposed algorithms are more scalable than existing approaches. On host arrays with 50% unavailable PEs, our algorithms can produce logical arrays with harvest over 56% while existing approaches fail to construct a feasible logical array. (C) 2014 Elsevier Inc. All rights reserved.
Effective fault tolerance techniques are essential for improving the reliability of multiprocessor systems. At the same time, fault tolerance must be achieved at high speed to meet the real-time constraints of embedde...
详细信息
Effective fault tolerance techniques are essential for improving the reliability of multiprocessor systems. At the same time, fault tolerance must be achieved at high speed to meet the real-time constraints of embedded systems. While parallelism has often been exploited to increase performance, to the best of our knowledge, there has been no previously reported work on parallel reconfiguration of mesh-connected processor arrays with faults. This paper presents two parallel algorithms to accelerate reconfiguration of the processor arrays. The first algorithm reconfigures a host array in parallel in a multithreading manner. The threads in the parallel algorithm execute independently within a safe rerouting distance. The second algorithm is based on a divide-and-conquer approach to first generate the leftmost segments in parallel and then merge the segments in parallel. When compared to the conventional algorithm, simulation results from a large number of instances confirm that the proposed algorithms significantly accelerate the reconfiguration without loss of harvest.
Particle filters are a state-of-the-art method for the state estimation of non-linear stochastic systems. Recent many-core architectures and cellular processor arrays offer a new paradigm for algorithm development, wh...
详细信息
Particle filters are a state-of-the-art method for the state estimation of non-linear stochastic systems. Recent many-core architectures and cellular processor arrays offer a new paradigm for algorithm development, which provides not only high performance, but also theoretical advances for parallel implementations. We have developed a new variant of the particle filter algorithm, which suits ideally implementation on a cellular processor array. The new algorithm often performs better than the classical one and a significant gain in running time can be achieved, especially when there is a large number of particles to be simulated. (C) 2012 Elsevier B.V. All rights reserved.
To enhance fabrication yield for processor arrays, many reconfiguration schemes for replacing faulty processing elements (PE's) with spare PE's have been proposed. An array grid model based on single-tracks is...
详细信息
To enhance fabrication yield for processor arrays, many reconfiguration schemes for replacing faulty processing elements (PE's) with spare PE's have been proposed. An array grid model based on single-tracks is one of such models. For this model, some algorithms for reconfiguring processor arrays have been proposed. However, an algorithm which can reconfigure the array, whenever the array is reconfigurable, has not been proposed yet. This paper presents two types of methods for reconfiguration of processor arrays. Both the types use indirect replacements for reconfiguring arrays. For an indirect replacement of a faulty non-spare PE, one has a fixed direction, the other has at most four directions among which one is chosen. For the former, we consider the several distribution of spare PE's, and computer simulations show a tendency in the term of difference in the distributions. The latter algorithms consist of two phases. In the first phase, rows and columns of spare PE's are decided in accordance with a rule. Several rules for deciding spare PE's are considered in this paper. In the second phase, faulty non-spare PE's are replaced with healthy spare PE's. By simulations the performance of the algorithms are evaluated and a tendency is shown in the terms of difference in disposition of spare PE's.
dBus-array(k, n) is an n-dimensional processor array of k(n) nodes connected via k(n-1) dBuses. A dBus is a unidirectional bus which receives signals from a set of k nodes (input set), and transmits signals to a diffe...
详细信息
dBus-array(k, n) is an n-dimensional processor array of k(n) nodes connected via k(n-1) dBuses. A dBus is a unidirectional bus which receives signals from a set of k nodes (input set), and transmits signals to a different set of k nodes (output set). Two optical implementations of the dBus-array(k, n) are discussed. One implementation uses the wavelength division multiplexing as in the wavelength division multiple access channel hypercube WMCH [7]. WMCH(k, n) and dBus-array(k, n) have the same diameter and about the same average internode distance, while the dBus-array requires only one tunable transmitter/receiver per node, compared to n tunable transmitters/receivers per node for the WMCH. The other implementation uses one fixed-wavelength transmitter/receiver per node and the dilated slipped banyan switching network (DSB) [17] to combine time division and wavelength division multiplexing.
暂无评论