Field-programmable logic (FPL) densities and performance have steadily improved, allowing DSP solutions to be integrated on a single FPL chip. The primary limitation of FPLs, in DSP-centric applications, is their intr...
详细信息
Field-programmable logic (FPL) densities and performance have steadily improved, allowing DSP solutions to be integrated on a single FPL chip. The primary limitation of FPLs, in DSP-centric applications, is their intrinsically weak arithmetic performance compared to DSP microprocessors and ASICs. In some cases, distributed arithmetic (DA) has been used to mask FPL arithmetic inadequacies. The Residue Number System (RNS) has demonstrated an ability to support high-bandwidth arithmetic with limited resources. This paper presents a methodology for merging distributed arithmetic with the residue number systems to achieve high-performance FPL solutions.
A multstatic frequency-modulated continuous wave (FMCW) radar system is under development for use in the control system of autonomous vehicles. The aim of the system, named Colarado, is the 3D location of obstacles in...
详细信息
A multstatic frequency-modulated continuous wave (FMCW) radar system is under development for use in the control system of autonomous vehicles. The aim of the system, named Colarado, is the 3D location of obstacles in the surrounding environment. In this paper a laboratory prototype system version, the demonstrator, is described and current results are presented.
Performance tuning of applications for shared-memory multiprocessors is to a great extent concerned with removal of performance bottlenecks caused by communication among the processors. To simplify performance tuning,...
详细信息
ISBN:
(纸本)9783540634409
Performance tuning of applications for shared-memory multiprocessors is to a great extent concerned with removal of performance bottlenecks caused by communication among the processors. To simplify performance tuning, our approach has been to extend the hardware/software interface with powerful memory-control primitives in combination with compiler optimizations to remove communication bottlenecks in distributed shared-memory multiprocessors. Evaluations have shown that this combination can yield quite dramatic application performance improvements. This raises the fundamental question of how the hardware/software interface in future distributed shared-memory machines should be defined to serve as a good target for performance tuning of shared-memory programs, either automatically or by hand. An approach along those lines is discussed.
We address the problem of CMOS cell width minimization in the general two-dimensional (2-D) layout style and propose a novel technique based on integer linear programming (ILP) to solve it exactly. We formulate a 0-1 ...
详细信息
ISBN:
(纸本)9780818675973
We address the problem of CMOS cell width minimization in the general two-dimensional (2-D) layout style and propose a novel technique based on integer linear programming (ILP) to solve it exactly. We formulate a 0-1 ILP model whose solution minimizes cell width along with the routing complexity across the diffusion rows. We present experimental results that evaluate the performance of two ILP solvers that have very different solution methods, and assess the effect of the number of rows on cell width. Run-times for optimal layouts are in seconds for cells with up to 20 transistors. For larger cells, we propose a practical circuit pre-processing scheme that dramatically reduces the run time with little or no loss in optimality.
We present a novel technique CLIP for optimizing both theheight and width of CMOS cell layouts in the two-dimensional (2-D) style. CLIP is based on integer-linear programming (ILP) and proceeds in two stages: First, a...
ISBN:
(纸本)9780897919203
We present a novel technique CLIP for optimizing both theheight and width of CMOS cell layouts in the two-dimensional (2-D) style. CLIP is based on integer-linear programming (ILP) and proceeds in two stages: First, an ILP model is used to determine a 2-D layout of minimum width W cell . Then, another model generatesa 2-D layout that has width W cell and requires a minimumnumber of routing tracks. Run times are in seconds for circuitswith up to 16 transistors. For larger circuits, we extend CLIP to ahierarchical method HCLIP that places series-connected transistorscontiguously. This reduces run times by up to three orders ofmagnitude, and still yields optimal results in over 80% of cases.
This paper presents a hardware architecture and a software tool needed for future autonomous robots. Specific attention is given to the execution of artificial neural networks and to the need for a good inspection and...
详细信息
In this paper, we assert that window system plays a fundamental role in supporting multiple interaction channels distributed over a finite number of I/O devices. For historical as well as technical reasons, window sys...
详细信息
Numerical Weather Prediction (NWP) is acknowledged as being of vital importance to economy. The demand that NWP places on computing system performance has increased dramatically since the introduction of computer syst...
详细信息
A new parallel processing system has been proposed, and a prototype model of the system has been constructed. It is designed to perform parallel vector operations at maximum efficiency. In addition, it can also handle...
详细信息
暂无评论