This paper presents a parallel sorting algorithm which sorts n elements in O(n/w + n log n/p) time using p(less-than-or-equal-to n) processors arranged in a 1-dimensional grid with w (less-than-or-equal-to n1-epsilon)...
详细信息
This paper presents a parallel sorting algorithm which sorts n elements in O(n/w + n log n/p) time using p(less-than-or-equal-to n) processors arranged in a 1-dimensional grid with w (less-than-or-equal-to n1-epsilon) buses for every fixed epsilon > 0. Furthermore, it is shown that n X p elements can be sorted in O(n/w + n log n/p) time on p x p (p less-than-or-equal-to n) processors arranged in a 2-dimensional grid with w(less-than-or-equal-to n1-epsilon) buses in each column and in each row. These algorithms are optimal because their time complexities are equal to the lower bounds.
A reconfiguration method for processor array is proposed in this paper. In the method, generic algorithm (GA) is used for searching effective spare arrangement, which leads to successful reconfiguration. The effective...
详细信息
A reconfiguration method for processor array is proposed in this paper. In the method, generic algorithm (GA) is used for searching effective spare arrangement, which leads to successful reconfiguration. The effectiveness of the method is demonstrated by computer simulations.
In this correspondence, we propose to set up assignment rules based on the available interconnection resources of 2-D processor arrays. The reconfiguration of 2-D processor arrays is guided by the assignment rules, su...
详细信息
In this correspondence, we propose to set up assignment rules based on the available interconnection resources of 2-D processor arrays. The reconfiguration of 2-D processor arrays is guided by the assignment rules, such that logical cells are always connected through the available interconnection resources. The advantage of our proposed approach is the easy adaptation to various system requirements, by adjusting assignment rules. The proposed reconfiguration algorithm can be extended to arrays with clustered faulty cells, overcoming the weakness of fixed domain approaches.
In this paper, we present an O(1) time algorithm to solve the minimum coloring problem defined on a set of intervals, which is also called the channel assignment problem. This problem has not been solved in O(1) time ...
详细信息
In this paper, we present an O(1) time algorithm to solve the minimum coloring problem defined on a set of intervals, which is also called the channel assignment problem. This problem has not been solved in O(1) time before, even on the idealistic CRCW PRAM model. For large-sized problems, it is desirable to have fast hardware solutions. Our algorithm is based on the processor arrays with a reconfigurable bus system (abbreviated to PARBS) that consists of a processor array and a reconfigurable bus system. In order to solve this problem with constant time complexity, we first transform the ''left-edge'' channel assignment algorithm to the parenthesis-matching problem. Based on this novel scheme, we are able to explore constant-time parallel algorithms to solve the minimum coloring problem for n intervals, which is also called the channel assignment problem, on a PARBS with O(n2) processors.
A data-driven method for error detection and fault diagnosis in processor arrays is proposed under the assumption that data streams can only be inserted and observed through the boundary processors. The method consist...
详细信息
A data-driven method for error detection and fault diagnosis in processor arrays is proposed under the assumption that data streams can only be inserted and observed through the boundary processors. The method consists of attaching tags to data streams, thereby allowing the data items to carry their own control and error information. Our goal is to detect the malfunction of a specific processor at a specific time step. A tag, which initially contains control information to activate a testing process, is changed to indicate the occurrence of an error by a checking processor detecting an inconsistency. To pinpoint a faulty processor, the front-end computer must go through the reverse process of identifying the processor that detected and signalled the inconsistency. Using two data streams, we can control every processor in the array and locate the faulty one. The resulting processor array is regular in structure, and the number of bits used to encode the control and error information is independent of the size of the array, thus leading to efficiency and scalability. (C) 1997 Elsevier Science B.V. (C) 1997 Elsevier Science B.V.
Dictionary machine is an important VLSI system performing high speed data archival operations. In this paper, we present a design which can efficiently implement dictionary machines in VLSI processor arrays. In order ...
详细信息
Dictionary machine is an important VLSI system performing high speed data archival operations. In this paper, we present a design which can efficiently implement dictionary machines in VLSI processor arrays. In order to effectively process the operations oi dictionary machine, hexagonal mesh is selected as the host topology in which two different networks for update and query operation are embedded. The proposed design is simple to implement as well as allows high throughput.
Fault-tolerance is undoubtedly a desirable property of any processor array. However, increased design and implementation costs should be expected when fault-tolerance is being introduced into the architecture of a pro...
详细信息
Fault-tolerance is undoubtedly a desirable property of any processor array. However, increased design and implementation costs should be expected when fault-tolerance is being introduced into the architecture of a processor array. When the processor array is implemented within a single VLSI chip, these cost increases are directly related to the chip silicon area. Thus, the increase in area should be weighed against the improved performance of the gracefully degrading fault-tolerant processor array. In addition, a larger chip area might reduce the wafer yield to an unaceptable level making the use of fault-tolerant VLSI processor arrays impractical. The objective of this paper is to devise performance measures for the evaluation of the effectiveness and area utilization of various fault-tolerant techniques. Another goal is to analyze the reduction in wafer yield and investigate the possibility of yield enhancement through redundancy.
We investigate the use of indirect addressing in processor arrays as a way to improve the processing of recursive neighbourhood (i.e., data-dependent) operations. The efficiency and speed for processing six such opera...
详细信息
We investigate the use of indirect addressing in processor arrays as a way to improve the processing of recursive neighbourhood (i.e., data-dependent) operations. The efficiency and speed for processing six such operations is measured for both window and crinkle mapping, and evaluated against the efficiency of traditional updating methods.
Most existing methods of mapping algorithms into processor arrays are restricted to the case where n-dimensional algorithms, or algorithms with n nested loops, are mapped into (n - 1)-dimensional arrays. However, in p...
详细信息
Most existing methods of mapping algorithms into processor arrays are restricted to the case where n-dimensional algorithms, or algorithms with n nested loops, are mapped into (n - 1)-dimensional arrays. However, in practice, it is interesting to map n-dimensional algorithms into (k - 1)-dimensional arrays where k < n. For example, many algorithms at bit level are at least four-dimensional (matrix multiplication, convolution, LU decomposition, etc.) and most existing bit level processor arrays are two-dimensional. A computational conflict occurs if two or more computations of an algorithm are mapped into the same processor and the same execution time. In this paper, based on the Hermite normal form of the mapping matrix, necessary and sufficient conditions are derived to identify mappings without computational conflicts. These conditions are used to find time mappings of n-dimensional algorithms into (k - 1)-dimensional arrays, k < n, without computational conflicts. For some applications, the mapping is time-optimal.
An advanced spare-connection scheme for k-out-of-n redundancy called "generalized additional bypass linking" is proposed for constructing fault-tolerant massively parallel computers with series-connected, me...
详细信息
An advanced spare-connection scheme for k-out-of-n redundancy called "generalized additional bypass linking" is proposed for constructing fault-tolerant massively parallel computers with series-connected, mesh-connected, or tree-connected processing element (PE) arrays. This scheme uses bypass links with wired OR connections to selectively connect the primary PEs to a spare PE in parallel. These bypass links are allocated to the primary PEs by node-coloring of a graph with a minimum inter-node distance of three in order to minimize the number of bypass links (i.e., the chromatic number). The main advantage of this scheme is that it can be used for constructing various k-out-of-n configurations capable of enhanced PE-to-PE communication and broadcast while still achieving strong fault tolerance for these PEs and links. In particular, it enables the construction of optimal r-strongly-fault-tolerant configurations capable of direct k-out-of-n selections by providing r spare PEs and r extra connections per PE for any kind of array when node-coloring with a distance of three is used. This simple spare-circuit structure enhances fault tolerance more than conventional schemes do. The node-coloring patterns were constructed using new node-coloring algorithms and the chromatic numbers were evaluated theoretically. Enhanced PE-to-PE communication and broadcast were achieved by using new fault-tolerant routing algorithms based on the properties of the node-coloring patterns with four or five message transmission steps being optimal configurations with any size array.
暂无评论