An approach to the three-layer or four-layer channel-routing problem is presented. A general technique that transforms a two-layer routing solution systematically into a three-layer routing solution is developed. The ...
详细信息
An approach to the three-layer or four-layer channel-routing problem is presented. A general technique that transforms a two-layer routing solution systematically into a three-layer routing solution is developed. The proposed router performs well in comparison with other three-layer channel routers proposed thus far. In particular, it provides a ten-track optimal solution for the famous Deutsch's difficult example, whereas other well-known three-layer channel routers required 11 or more tracks. The approach is extended to four-layer channel routing. Given any two-layer channel-routing solution without an unrestricted dogleg that uses w tracks, the router can obtain a four-layer routing solution using no more than w/2 tracks. A theoretical upper bound d/2+2 for arbitrary four-layer channel routing problems is also given.< >
A new fast approach to electrical power network analysis is presented in this paper. A description of a real-time multimachine power system simulator, developed at the University of Bath, will be given, which has been...
详细信息
A new fast approach to electrical power network analysis is presented in this paper. A description of a real-time multimachine power system simulator, developed at the University of Bath, will be given, which has been used rather than the more conventional fast decoupled load now. The operation of the algorithms that have been developed as part of this simulator will be described, together with an explanation of how fuzzy set theory has been used for contingency screening and analysis, as well as the identification of system transient and dynamic instability. Results are presented for these fuzzy algorithms and compared with traditional numerical methods for the IEEE 57 bus system and a reduced British National Grid network. Finally, conclusions are drawn in the light of these results, illustrating the benefits that can be achieved with respect to accuracy and computational speed.
Due to the architectural design, process variations and aging, individual cores in many-core systems exhibit heterogeneous performance. In many-core systems, a commonly adopted soft error mitigation technique is Redun...
详细信息
Due to the architectural design, process variations and aging, individual cores in many-core systems exhibit heterogeneous performance. In many-core systems, a commonly adopted soft error mitigation technique is Redundant Multithreading (RMT) that achieves error detection and recovery through redundant thread execution on different cores for an application. However, task mapping and the task execution mode (i.e., whether a task executes in a reliable mode with RMT or unreliable mode without RMT) need to be considered for achieving resource-efficient reliability. This paper explores how to efficiently assign the tasks onto different cores with heterogeneous performance properties and determine the execution modes of tasks in order to achieve high reliability and satisfy the tolerance of timeliness. We demonstrate that the task mapping problem under heterogeneous performance can be solved by employing Hungarian algorithm as subroutine to efficiently assign the tasks onto the cores to optimize the system reliability with polynomial time complexity. To obtain the efficient task execution modes, we also propose an iterative mode adaptation technique and guarantee the tolerable timing constraint. Our results illustrate that compared to state-of-the-art, the proposed approaches achieve up to 80 percent reliability improvement (on average 20 percent) under different scenarios of chip frequency variation maps.
Variants of the Winograd FFT algorithm for prime transform size are derived that offer options as to operational counts and arithmetic balance. Their implementations on VAX, IBM 3090 VF, and IBM RS/6000 are discussed....
详细信息
Variants of the Winograd FFT algorithm for prime transform size are derived that offer options as to operational counts and arithmetic balance. Their implementations on VAX, IBM 3090 VF, and IBM RS/6000 are discussed. For processors that perform floating-point addition, floating-point multiplication, and floating-point ''multiply-add'' with the same time delay, variants of the FFT algorithm have been designed such that all floating-point multiplications can be overlapped by using ''multiply-add.'' The use of a tensor product formulation, throughout, gives a means for producing variants of algorithms matching to computer architectures.
Explicit formulas are derived for designing lattice wave digital filters of the most common filter types, for Butterworth, Chebyshev, inverse Chebyshev, and Cauer parameter (elliptic) filter responses. Using these for...
详细信息
Explicit formulas are derived for designing lattice wave digital filters of the most common filter types, for Butterworth, Chebyshev, inverse Chebyshev, and Cauer parameter (elliptic) filter responses. Using these formulas a direct top down design method is obtained and most of the practical design problems can be solved without special knowledge of filter synthesis methods. Since the formulas are simple enough also in the case of elliptic filters, the design process is sufficiently simple to serve as basis in the first part (filter design from specs to algorithm) of silicon compilers or applied to high level programmable digital signal processors.
Approximate computing is claimed to be a powerful knob for alleviating the peak power and energy-efficiency issues. However, providing a consistent benchmark suit with diverse applications amenable to approximate comp...
详细信息
Approximate computing is claimed to be a powerful knob for alleviating the peak power and energy-efficiency issues. However, providing a consistent benchmark suit with diverse applications amenable to approximate computing is crucial to ensure fair and reproducible comparisons. This article makes an important attempt toward it in the form of the AxBench suite, which contains applications for CPUs, GPUs, and hardware design with necessary annotations to mark the approximable regions and output quality metrics. -Muhammad Shafique, Vienna University of Technology.
Network management protocols often require timely and meaningful insight about per flow network traffic. This paper introduces Randomized Admission Policy (RAP) -a novel algorithm for the frequency, top-k, and byte vo...
详细信息
Network management protocols often require timely and meaningful insight about per flow network traffic. This paper introduces Randomized Admission Policy (RAP) -a novel algorithm for the frequency, top-k, and byte volume estimation problems, which are fundamental in network monitoring. We demonstrate space reductions compared to the alternatives, for the frequency estimation problem, by a factor of up to 32 on real packet traces and up to 128 on heavy-tailed workloads. For top-k identification, RAP exhibits memory savings by a factor of between 4 and 64 depending on the workloads' skewness. These empirical results are backed by formal analysis, indicating the asymptotic space improvement of our probabilistic admission approach. In Addition, we present d-way RAP, a hardware friendly variant of RAP that empirically maintains its space and accuracy benefits.
In this paper, instead of using the base-2 number system, we use a base-m number system to represent the numbers used in the proposed algorithms, Such a strategy can be used to design an O(T) time, T = [log(m) N] + 1,...
详细信息
In this paper, instead of using the base-2 number system, we use a base-m number system to represent the numbers used in the proposed algorithms, Such a strategy can be used to design an O(T) time, T = [log(m) N] + 1, prefix sum algorithm for a binary sequence with N-bit on a cross-bridge reconfigurable array of processors using N processors, where the data bus is m-bit wide, Then, this basic operation can be used to compute the histogram of an n x n image with G gray-level value in constant time using G x n x n processors, and compute the Hough transform of an image with N edge pixels and n x n parameter space in constant time using n x n x N processors, respectively. This result is better than the previously known results proposed in the literature [18], [32]. Also, the execution time of the proposed algorithms is tunable by the bus bandwidth.
A method for finding an adaptive polygonal approximation of an implicitly defined surface is presented. For algebraic surfaces, the method yields an approximation guaranteed accurate to within some user-specified tole...
详细信息
A method for finding an adaptive polygonal approximation of an implicitly defined surface is presented. For algebraic surfaces, the method yields an approximation guaranteed accurate to within some user-specified tolerance of the actual surface. This polygonalization can then be rendered using standard shaded polygon drawing techniques. A method for eliminating or improving the aspect ratios of the 'skinny' polygons that often arise in traditional polygonalization methods is also presented. This method has proved particularly useful in the creation of polygonalization for finite-element analysis.
When an over-the-cell routing layer is available for standard cell layout, efficient utilization of that routing space over the cells can significantly reduce layout area. In this paper, we present three physical mode...
详细信息
When an over-the-cell routing layer is available for standard cell layout, efficient utilization of that routing space over the cells can significantly reduce layout area. In this paper, we present three physical models to utilize the area over the cells for routing in standard cell designs. We also present efficient algorithms to choose and to route a planar subset of nets over the cells so that the resulting channel density is reduced as much as possible. For each of the physical models, we show how to arrange inter-cell routing, over-the-cell routing, and power/ground buses to achieve valid routing solutions. Each algorithm exploits the particular arrangement in the corresponding physical model and produces provably good results in polynomial time. We tested our algorithms on two industrial standard cell designs. In these tests, this method reduces total channel density by as much as 21%.
暂无评论