A common technique to improve the reliability of loop (or ring) networks is by introducing link redundancy; that is, by providing several alternative paths for communication between pairs of nodes. With alternate path...
A common technique to improve the reliability of loop (or ring) networks is by introducing link redundancy; that is, by providing several alternative paths for communication between pairs of nodes. With alternate paths between nodes, the network can now sustain several node and link failures by bypassing the faulty components. However, faults occurring at strategic locations in a ring can prevent the computation by disrupting I/O operations, blocking the flow of information, or even segmenting the structure into pieces which can no longer be suitable for any practical purpose. An extensive characterization of fault-tolerance in ring topologies is given in this paper. This characterization augments the results known in the literature to date. The characterization has revealed several properties which describe the problem of constructing subrings and linear arrays in the presence of node failures in the original ring for a specified link configuration. Also in this paper, bounds are established on the degree of fault tolerance achievable in a redundant loop network, with a given degree of redundancy, when performing a computation that requires a minimal number of operational nodes. Also the bounds on the size of the problems guaranteed to be solved in the presence of a given number of faults in the network are derived.
Use of hand optimized Intellectual Property (IP) logic cores is prolific in hardware design. These IP cores range from rather complicated signal processing transforms and filters to arithmetic operators. While IP core...
详细信息
Use of hand optimized Intellectual Property (IP) logic cores is prolific in hardware design. These IP cores range from rather complicated signal processing transforms and filters to arithmetic operators. While IP cores remain a standard way to utilize the improvement in FPGA technology and contend with time to market pressure through reuse, popularity of tools generating hardware descriptions from high-level languages is increasing in popularity. The PACT HDL behavioral synthesis tool attempts to combine these two methods within a power-aware framework. PACT HDL generates RTL HDL codes in VHDL and Verilog using a finite state machine (FSM) style. These codes use intrinsic operators to represent calculations such as addition, subtraction, and multiplication. The output HDL codes are passed to commercial RTL synthesis tools that generate the gate-level hardware descriptions. Each intrinsic operator is replaced with a hardware implementation of the calculation by the synthesis tool. Unfortunately, by leaving this decision to the synthesis tool, the gate-level instantiation may not be appropriate for the desired constraints, particularly those relating to power consumed. The synthesis tools tend to use combinational implementations that are area and power hungry. In some cases, the tool may not be able to instantiate the appropriate logic, such as the division operator, at all.
At the logic level, a popular approach is to power down the sequential machine during the self-loops of the underlying finite state machine (FSM). In this work, we extend this idea to resynthesize existing sequential ...
详细信息
At the logic level, a popular approach is to power down the sequential machine during the self-loops of the underlying finite state machine (FSM). In this work, we extend this idea to resynthesize existing sequential circuits to reduce power. We report a novel technique based on symbolic simulation of a sequential circuit to extract its self-loops without extracting the corresponding state transition diagram (STG). Since self loops may not be inherently present in the corresponding FSM, we partition the circuit heuristically and identify partial-self-loops for each partition to bring down the corresponding sub-circuit by gating the clock sub-tree feeding that partition. By using this approach, we could save up to 45% of the total power on a controller circuit of a microprocessor design, where traditional techniques could not save any power.
The main problem for the design of dictionary machines on coarse grained hypercube multiprocessors, in comparison to the widely studied dictionary problem for fine grained hypercube multiprocessors, is that due to une...
详细信息
I/O performance remains a weakness of parallelcomputing systems today. While this weakness is partly attributed to rapid advances in other system components, I/O interfaces available to programmers and the I/O method...
详细信息
ISBN:
(纸本)9780769519197
I/O performance remains a weakness of parallelcomputing systems today. While this weakness is partly attributed to rapid advances in other system components, I/O interfaces available to programmers and the I/O methods supported by file systems have traditionally not matched efficiently with the types of I/O operations that scientific applications perform, particularly noncontiguous accesses. The MPI-IO interface allows for rich descriptions of the I/O patterns desired for scientific applications and implementations such as ROMIO have taken advantage of this ability while remaining limited by underlying file system methods. A method of noncontiguous data access, list I/O, was recently implemented in the parallel Virtual File System (PVFS). We implement support for this interface in the ROMIO MPI-IO implementation. Through a suite of noncontiguous I/O tests we compared ROMIO list I/O to current methods of ROMIO noncontiguous access and found that the list I/O interface provides performance benefits in many noncontiguous cases.
An efficient branch and bound algorithm for fine-grained hypercube multiprocessors is presented. The method uses a global storage allocation scheme where all processors collectively store all back-up paths such that e...
详细信息
An efficient branch and bound algorithm for fine-grained hypercube multiprocessors is presented. The method uses a global storage allocation scheme where all processors collectively store all back-up paths such that each processor needs to store only a constant amount of information. At each iteration of the algorithm, all nodes of the current back-up tree may decide whether they need to create new children, be pruned, or remain unchanged. An algorithm that, on the basis of these decisions, updates the current back-up tree and distributes global information in O(log m) steps, where m is the current number of nodes, is described. This method also provides a dynamic allocation mechanism that obtains optimal load balancing. Another important property of the method is that, even if very drastic changes in the current back-up tree occur, the performance of the load balancing mechanism remains constant. The method is currently being implemented on the Connection Machine.< >
Fault tolerance through the incorporation of redundancy and reconfiguration is quite common. Regular systems are being designed with massive redundancy built into them [5,6,12], These systems also make use of the redu...
详细信息
The computing power provided by high performance and low cost PC-based clusters and Grid computing platforms are attractive and they are equal or superior to supercomputers and mainframes. In parallel, discussions on ...
详细信息
Cluster and grid computing is a relatively new interdisciplinary field, where computer science, engineering and computational biology as its core supporting disciplines. The rise of cluster and grid computing discipli...
详细信息
The parentheses matching problem is to determine the mate of each parenthesis in a balanced string of n parentheses. In this paper, we present three novel and elegant parallel algorithms for this problem on parallel r...
详细信息
暂无评论