Simulated annealing based standard cell placement for VLSI designs has long been acknowledged as a compute-intensive process. All previous work in parallel simulated annealing based placement has minimized area, but w...
详细信息
Simulated annealing based standard cell placement for VLSI designs has long been acknowledged as a compute-intensive process. All previous work in parallel simulated annealing based placement has minimized area, but with deep submicron design, minimizing wirelength delay is also needed. The algorithm discussed in this paper is the first parallel algorithm for timing driven placement. We have used a very accurate Elmore delay model which is more complete intensive and hence the need for parallel placement is more apparent. parallel placement is also needed for very large circuits that may not fit in the memory of a single processor. Therefore, our algorithm is circuit partitioned and can handle arbitrary large circuits on distributed memory multiprocessors. The algorithm, called mpi PLACE, has been tested on several large benchmarks on a variety of parallel architectures.
The timing-driven global routing problem is an extremely important and time consuming phase of any automated layout system. In this paper, by integrating high performance interconnection tree construction, wire-sizing...
详细信息
The timing-driven global routing problem is an extremely important and time consuming phase of any automated layout system. In this paper, by integrating high performance interconnection tree construction, wire-sizing, and switch-able segment channel optimization together, we propose an adaptive timing-driven global routing algorithm which minimizes the timing delay as well as circuit area. Our experiments on MCNC benchmarks show that our timing-driven global routing algorithm reduces the maximum path delays significantly from the global router TimberWolfSC. Based on this adaptive timing-driven global routing algorithm, a parallel algorithm on timing-driven global routing for standard cells is given. This algorithm has been implemented on an 8 processor IBM J-40 shared memory multi-processor by using the Message Passing Interface (MPI). Our experimental results show good speedup and circuit delay results for this parallel algorithm using MCNC benchmark circuits.
In this paper, we present BioGrid, a novel computing resource that combines advantages of grid computing technology with bioinformatics parallel applications. The grid environment permits the sharing of a large amount...
详细信息
With the tremendous advances in processor and memory technology, I/O has risen to become the bottleneck in high-performance computing for many applications. The development of parallel file systems has helped to ease ...
详细信息
With the tremendous advances in processor and memory technology, I/O has risen to become the bottleneck in high-performance computing for many applications. The development of parallel file systems has helped to ease the performance gap, but I/O still remains an area needing significant performance improvement. Research has found that noncontiguous I/O access patterns in scientific applications combined with current file system methods, to perform these accesses lead to unacceptable performance for large data sets. To enhance performance of noncontiguous I/O, we have created list I/O, a native version of noncontiguous I/O. We have used the parallel Virtual File System (PVFS) to implement our ideas. Our research and experimentation shows that list I/O outperforms current noncontiguous I/O access methods in most I/O situations and can substantially enhance the performance of real-world scientific applications.
Patterns of faults that are catastrophic for regular architectures, particularly the systolic arrays, have been studied. For a given link configuration, there are many fault patterns which are catastrophic. Among thos...
详细信息
Patterns of faults that are catastrophic for regular architectures, particularly the systolic arrays, have been studied. For a given link configuration, there are many fault patterns which are catastrophic. Among those, there is a particular fault pattern, called the reference fault pattern, which is crucial for the development of testing techniques; furthermore, the efficiency of any testing algorithm can be further improved in the presence of efficient algorithms for constructing the reference fault pattern. The authors develop a new algorithm for the construction of the reference fault pattern for VLSI reconfigurable arrays in which the links are bidirectional. The complexity of the new algorithm is O(kN) which is a significant improvement over the existing O(N/sup 2/) algorithm, where k is the number of bypass links, and N is the length of the largest bypass link.< >
For a given design, it is not difficult to identify a set of elements whose failure will have catastrophic consequence. There exist many patterns (random distribution) of faults, not in a block, which can be fatal for...
详细信息
For a given design, it is not difficult to identify a set of elements whose failure will have catastrophic consequence. There exist many patterns (random distribution) of faults, not in a block, which can be fatal for the system. Therefore, the characterization of such fault patterns is crucial for the identification, testing and detection of such catastrophic events. This paper, is concerned with the development of efficient recognition schemes; that is, efficient mechanisms which automatically determine whether or not an observed/detected pattern of faults will have catastrophic consequences. The problem of recognizing whether a fault pattern is catastrophic has been addressed.< >
In this paper we present some novel algorithms for scheduling hierarchical signal flow graphs in the domain of high-level synthesis. There are several key contributions of this paper. First, we develop a novel extensi...
详细信息
In this paper we present some novel algorithms for scheduling hierarchical signal flow graphs in the domain of high-level synthesis. There are several key contributions of this paper. First, we develop a novel extension of the force directed scheduling problem which naturally handles loops and conditionals by coming up with a scheme of scheduling hierarchical signal flow graphs. Second, we develop three new parallel algorithms for the scheduling problem. Third, our parallel algorithms are portable across a wide range of parallel platforms. We report results on a set of high-level synthesis benchmarks on 8-processor SGI Challenge and a network of 4 SUN SPARCstation5 work stations. Finally, while some parallel algorithms for VLSI CAD reported by earlier researchers have reported a loss of qualities of results, our parallel algorithms produce exactly the same results as the sequential algorithms on which they are based.
The co-allocation architecture was developed in order to enable parallel downloads of datasets from multiple servers. Several co-allocation strategies have been coupled and used to exploit rate differences among vario...
详细信息
暂无评论