The paper presents new logic synthesis methods for single-output incomplete multi-level binary circuits using Memristor-based material implication gates. The first method follows Lehtonen's assumption of using onl...
详细信息
The paper presents new logic synthesis methods for single-output incomplete multi-level binary circuits using Memristor-based material implication gates. The first method follows Lehtonen's assumption of using only two working memristors. The algorithm minimizes the number of implication (IMPLY) gates, which corresponds to minimizing the number of pulses or the delay time. This greedy search method uses essential and secondary essential primes, does not require solving the covering problem, is fast, and produces high quality results. We compare it to other synthesis methods, such as the modified SOP and Exclusive-Or Sum of Products (ESOP) with minimum number of working memristors. We analyze the problem of reduction in IMPLY gate count by adding more working memristors and introduce Imply Sequence Diagrams, a new notation, similar to one used in reversible logic.
Pipelining is an important technique in high-level synthesis, which overlaps the execution of successive loop iterations or threads to achieve high throughput for loop/function kernels. Since existing pipelining techn...
详细信息
Pipelining is an important technique in high-level synthesis, which overlaps the execution of successive loop iterations or threads to achieve high throughput for loop/function kernels. Since existing pipelining techniques typically enforce in-order thread execution, a variable-latency operation in one thread would block all subsequent threads, resulting in considerable performance degradation. In this paper, we propose a multithreaded pipelining approach that enables context switching to allow out-of-order thread execution for data-parallel kernels. To ensure that the synthesized pipeline is complexity effective, we further propose efficient scheduling algorithms for minimizing the hardware overhead associated with context management. Experimental results show that our proposed techniques can significantly improve the effective pipeline throughput over conventional approaches while conserving hardware resources.
Summary form only given. Escalating costs of semiconductor technology and its lagging performance relative to historic trends is motivating acceleration and specialization as more impactful means to increase system va...
详细信息
Summary form only given. Escalating costs of semiconductor technology and its lagging performance relative to historic trends is motivating acceleration and specialization as more impactful means to increase system value. Targeted specialization is being increasingly pursued as an important way to achieve dramatic improvements in workload acceleration. This requires a broad understanding of workloads, system structures, and algorithms to determine what to accelerate / specialize, and how, i.e., via SW?; via HW?; or via SW+HW? which presents many choices, necessitating co-optimization of SW and HW. In this talk, we will focus on an application driven approach to high level design for software and system co-optimization, based on inventing new software algorithms, that have strong affinity to hardware acceleration.
Post-fabrication performance compensation and adaptive delay testing are indispensable means for improving yield and reliability of LSIs, The global parameter estimations, such as of threshold voltages, play a key rol...
详细信息
Post-fabrication performance compensation and adaptive delay testing are indispensable means for improving yield and reliability of LSIs, The global parameter estimations, such as of threshold voltages, play a key role in maximizing their effectiveness. This paper proposes a novel technique that realizes an accurate device-parameter estimation through F max testing framework. In the proposed method, statistical path delay distributions of sensitized paths in F max testing are utilized to calculate device-parameters, such that they most likely explain the measurements in the F max testing. Two estimation procedures are proposed: one utilizes discrete Bayesian estimation and the other uses maximum likelihood estimation. Numerical experiments demonstrate that both methods achieve 2.5mV accuracy in estimating threshold voltages.
VLSI systems are commonly specified using sequential executable functional specifications, but implemented in a highly concurrent manner. Alhough the methods to transform between the sequential specification and concu...
详细信息
VLSI systems are commonly specified using sequential executable functional specifications, but implemented in a highly concurrent manner. Alhough the methods to transform between the sequential specification and concurrent implementation have been well-studied, there are still substantial difficulties in verifying that the concurrent implementation corresponds to the sequential specification after low-level optimization. The majority of methods for doing this verification have focused on strong semantic models for reasoning about systems and their specifications, but these models can add significant unnecessary complexity. In this paper, we explore a weak but effective method for reasoning about implementation relations. We show how a sequential embedding of a concurrent program can be generated, and how that embedding can be used to dramatically reduce the reachable state space of the verification problem while maintaining the semantic model of interest.
The recent TAU computer-aideddesign (CAD) contest has aimed to seek novel ideas for accurate and fast clock network pessimism removal (CNPR). Unnecessary pessimism forces the static-timing analysis (STA) tool to repo...
详细信息
The recent TAU computer-aideddesign (CAD) contest has aimed to seek novel ideas for accurate and fast clock network pessimism removal (CNPR). Unnecessary pessimism forces the static-timing analysis (STA) tool to report worse violation than the true timing properties owned by physical circuits, thereby misleading signoff timing into a lower clock frequency at which circuits can operate than actual silicon implementations. Therefore, we introduce in this paper UI-Timer, a powerful CNPR algorithm which achieves exact accuracy and ultra-fast runtime. Unlike existing approaches which are dominated by explicit path search, UI-Timer proves that by implicit path representation the amount of search effort can be significantly reduced. Our timer is superior in both space and time saving, from which memory storage and important timing quantities are available in constant space and constant time per path during the search. Experimental results on industrial benchmarks released from TAU 2014 CAD contest have justified that UI-Timer achieved the best result in terms of accuracy and runtime over all participating timers.
With the continuous shrinking of minimum feature sizes beyond current 193nm wavelength for optical micro lithography, the electronic industry relies on Resolution Enhancement Techniques (RETs) to improve pattern trans...
详细信息
With the continuous shrinking of minimum feature sizes beyond current 193nm wavelength for optical micro lithography, the electronic industry relies on Resolution Enhancement Techniques (RETs) to improve pattern transfer fidelity. However, the lithographic process is susceptible to dose and focus variations that will eventually cause lithographic yield degradation. In this paper, a new algorithm is proposed to minimize the Edge Placement Error (EPE) and the process variability of the printed image. The algorithm is also adapted to reduce the computational time using a novel approach through minimizing the number of convolutions during lithography simulation time. Experimental results show that the proposed algorithm results in less average cost than the top three teams of ICCAD 2013 contest on the public benchmarks.
As integrated circuit process technology progresses into the deep sub-micron region, the phenomenon of process variation has a growing impact on the design and analysis of digital circuits and more specifically in the...
详细信息
As integrated circuit process technology progresses into the deep sub-micron region, the phenomenon of process variation has a growing impact on the design and analysis of digital circuits and more specifically in the accuracy and integrity of timing analysis methods. The assumptions made by the analytical models, impose excessive and unwanted pessimism in timing analysis. Thus, the necessity of removing the inherited pessimism is of utmost importance in favour of accuracy. In this paper an approach to the common path pessimism removal timing analysis problem, TKtimer, is presented. By utilizing certain key techniques such as branch-and-bound, caching, tasklevel parallelism and enhanced algorithmic techniques, the approach described by this paper is able to handle any type and size of clock network trees and showed 100% accuracy combined with reasonable execution time within a straightforward solution context.
Digital microfluidic biochips enable a higher degree of automation in laboratory procedures in biochemistry and molecular biology and have received significant attention in the recent past. Their design is usually con...
详细信息
Digital microfluidic biochips enable a higher degree of automation in laboratory procedures in biochemistry and molecular biology and have received significant attention in the recent past. Their design is usually conducted in several stages with routing being a particularly critical challenge. Previously proposed solutions for this design step suffer from two issues: They are mainly of heuristic nature and usually assume that the blockages to be bypassed are present the entire time. In contrast, we present a methodology which exploits the fact that blockages are often only present at certain intervals. At the same time, our approach guarantees exact solutions, i.e. always determines a routing with a minimal number of time steps. Experimental results show that, despite the huge complexity, optimal results can be achieved in reasonable run-time and that the consideration of temporary blockages indeed significantly improves the routing results.
Compressed cache was used in shared last level cache (LLC) to increase the effective capacity. However, because of various data compression sizes, fragmentation problem of storage is inevitable in this cache design. W...
详细信息
ISBN:
(纸本)9781479962792
Compressed cache was used in shared last level cache (LLC) to increase the effective capacity. However, because of various data compression sizes, fragmentation problem of storage is inevitable in this cache design. When it happens, usually, a compaction process is invoked to make contiguous storage space. This compaction process induces extra cycle penalty and degrades the effectiveness of compressed cache design. In this paper, we propose a compaction-free compressed cache architecture which can completely eliminate the time for executing compaction. Based on this cache design, we demonstrate that our results, compared with the conventional cache, have system performance improvement by 16% and energy reduction by 16%. Compared with the work by Alameldeen et al. [1], our design has 5% more performance improvement and 3% more energy reduction. Compared with the work by Sardashti et al. [2], our design has 3% more performance improvement and 2% more energy reduction.
暂无评论