检索结果-内蒙古大学图书馆

Proceedings of the 16th international acm/sigda symposium on field programmable gate arrays

作者： Kevin Camera Robert W. Brodersen University of California: Berkeley Berkeley CA

ISBN: (纸本)9781595939340

Large-scale, direct-mapped fpga computing systems are traditionally very difficult to debug due to the high level of parallelism and limited access to internal signal values. This poster describes our solution to this problem, in which the concepts of variables and process control are brought into the fpga hardware domain. Declarations made in the design environment are translated into logic inserted automatically into the hardware implementation. Variables provide full read/write access to hardware signals during runtime, complete with dynamic assertion checking capable of automatically halting the system clock. System data is consistently cached via attached DRAM, providing a very deep history of variable sample values and the ability to "rewind" system state. Process execution can also be controlled by the user on a cycle-by-cycle basis, either manually or through the declaration of breakpoints. Assertion failures and breakpoints are accurate within the same cycle of detection, and are implemented using on-chip gated clock buffers. All debugging controls are provided by a remote graphical user interface, which also supports back-annotation in the input design for improved data visibility and comprehension. The complete hardware and software infrastructure of the debugger has already been fully implemented, with user trials and overhead measurements ongoing at the time of writing

关键词： simulation verification design fpga

来源：评论

学校读者我要写书评

暂无评论

Communication bottleneck in hardware-software partitioning 08

Communication bottleneck in hardware-software partitioning

引用

Proceedings of the 16th international acm/sigda symposium on field programmable gate arrays

作者： Maryam Moazeni Alireza Vahdatpour Karthik Gururaj Majid Sarrafzadeh UCLA Los Angeles CA

ISBN: (纸本)9781595939340

The problem of hardware-software codesign for embedded systems using configurable architectures has been studied extensively in the past decade. In this work we studied the feasibility of utilizing Commercial Off-The-Shelf (COTS) fpga systems for codesign. We partitioned the implementation of a set of benchmark applications on hardware and software and studied the performance and resource consumption in the system. The result of experiments demonstrated that the communication between the processor and the reconfigurable architecture is the major hurdle in the codesign, especially when using COTS System on Chips. It is demonstrated that although implementing algorithms in hardware can lead to enormous speedup, the communication overhead for transferring data variables between the configurable architecture and the processor can destroy all the achieved speedup. We especially showed that in COTS fpgas this bottleneck is more restricting because of the weak communication structure between different IPs. Furthermore, analyzing the experimental results, we propose a partitioning mechanism; the evaluation results show that the achieved speedup using the proposed partitioning mechanism is between 2 to 300 based on application's data dependency

关键词： hardware-software codesign communication fpga

来源：评论

学校读者我要写书评

暂无评论

Configurable decoders with application in fast partial reconfiguration of fpgas 08

Configurable decoders with application in fast partial recon...

引用

Proceedings of the 16th international acm/sigda symposium on field programmable gate arrays

作者： Matthew Collin Jordan Ramachandran Vaidyanathan Intergraph Corporation Huntsville AL Louisiana State University Baton Rouge LA

ISBN: (纸本)9781595939340

A decoder is a hardware module that expands an x-bit input into an n-bit output, where x << n. It can be viewed as producing a set S of subsets of an n-element set Zn. If this set S can be altered by the user, the decoder is said to be configurable. We propose a class of configurable decoders (called "mapping-unit" based decoders or simply MU-decoders) that facilitate efficient selection of elements in an fpga (in general, in any chip). Conventional solutions for this selection use either (a) a fixed (non-reconfigurable) decoder that lacks the flexibility to generate many subsets quickly, or (b) a large look-up table (LUT) which is flexible, but too expensive. The proposed class of MU-decoders have much of the flexibility of the large-LUT solution (also called a LUT decoder here) at the cost of the fixed decoder solution. Specifically, we show that for any fixed gate cost, the a MU-decoder can produce any set of subsets that the LUT decoder can; in addition, the MU-decoder can exploit any available structure in the application at hand to produce many more subsets than the LUT decoder. We illustrate this ability in the context of totally ordered sets of subsets

关键词： configurable logic look-up table decoder fpga

来源：评论

学校读者我要写书评

暂无评论

A type system for static typing of a domain-specific language 08

A type system for static typing of a domain-specific languag...

引用

Proceedings of the 16th international acm/sigda symposium on field programmable gate arrays

作者： Paul E. McKechnie Nathan A. Lindop Wim A. Vanderbauwhede Institute for System Level Integration Livingston United Kingdom Xilinx Edinburgh United Kingdom University of Glasgow Glasgow United Kingdom

ISBN: (纸本)9781595939340

With the increase in system complexity, designers are increasingly using IP blocks as a means for filling the designer productivity gap. This has given rise to system level languages which connect IP blocks together. However, these languages have in general not been subject to formalisation. They are considered too trivial to justify the formalisation effort. Unfortunately, the lack of formality in these languages can give rise to errors that are not caught until late in the design cycle. We present a type system for static typing of such a system level language. We argue that the proposed type system will eliminate an important class of errors currently permitted by existing system level languages. A comparison is made against existing tools and we show that the type checker detects errors earlier in the design flow. This reduces synthesis iterations and decreases the time to market

关键词： fpga static type checking type system

来源：评论

学校读者我要写书评

暂无评论

When fpgas are better at floating-point than microprocessors 08

When FPGAs are better at floating-point than microprocessors

引用

Proceedings of the 16th international acm/sigda symposium on field programmable gate arrays

作者： Florent de Dinechin Jérémie Detrey Octavian Cret Radu Tudoran École Normale Supérieure de Lyon/Université de Lyon Lyon France Technical University of Cluj-Napoca Cluj-Napoca Romania

ISBN: (纸本)9781595939340

It has been shown that fpgas could outperform high-end microprocessors on floating-point computations thanks to massive parallelism. However, most previous studies re-implement in the fpga the operators present in a processor. This conservative approach is relatively straightforward, but it doesn't exploit the greater flexibility of the fpga. We survey the many ways in which the fpga implementation of a given floating-point computation can be not only faster, but also more accurate than its microprocessor counterpart. Techniques studied here include custom precision, mixing and matching fixed- and floating-point, specific accumulator design, dedicated architectures for coarser operators implemented as software in processors (such as elementary functions or Euclidean norms), operator specialization such as constant multiplication, and others. The FloPoCo project (http://***/LIP/Arenaire/Ware/FloPoCo/) aims at providing such non-standard operators. As a conclusion, current fpga fabrics could be enhanced to improve floating-point performance. However, these enhancements should not take the form of hard FPU blocks as others have suggested. Instead, what is needed is smaller building blocks more generally useful to the implementation of floating-point operators, such as cascadable barrel shifters and leading zero counters

关键词： arithmetic floating-point fpga

来源：评论

学校读者我要写书评

暂无评论

Efficient fpga implementation of qr decomposition using a systolic array architecture 08

Efficient FPGA implementation of qr decomposition using a sy...

引用

Proceedings of the 16th international acm/sigda symposium on field programmable gate arrays

作者： Xiaojun Wang Miriam Leeser Airvana Chelmsford MA Northeastern University Boston MA

ISBN: (纸本)9781595939340

QR decomposition is used in many signal processing applications. We have implemented a systolic array QR decomposition on a Xilinx Virtex5 fpga using the Givens rotation algorithm. It uses a truly two dimensional systolic array architecture so latency scales well for large matrices. To accommodate the dynamic range of input data, floating-point arithmetic is chosen, using the Northeastern University Variable Precision Floating-Point (VFloat) library. We support any general floating-point format including IEEE single precision. Our design uses straightforward floating-point divide and square root implementations, compared to prior work which uses special operations or formats such as CORDIC or the logarithmic number system (LNS). This makes our design more standard and portable to different systems, thus easier to fit into a larger design. We support square, tall and short matrices. The input matrix size can be configured at compile-time to virtually any size. Therefore, it can be easily scaled to future larger fpga devices, or over multiple fpgas. The QR module is fully pipelined with a throughput of over 130 MHz for IEEE single precision floating-point format. 35 GFlops throughput peak performance is achieved for a 12 by 12 matrix with this implementation

关键词： fpga

来源：评论

学校读者我要写书评

暂无评论

acm/sigda international symposium on field programmable gate arrays - fpga: Foreword

ACM/SIGDA International Symposium on Field Programmable Gate...

引用

acm/sigda international symposium on field programmable gate arrays - fpga 2006年 iii页

作者： Wilton, Steve DeHon, Andre University of British Columbia Canada California Institute of Technology United States

No abstract available

关键词：

来源：评论

学校读者我要写书评

暂无评论

From the bitstream to the netlist 08

From the bitstream to the netlist

引用

Proceedings of the 16th international acm/sigda symposium on field programmable gate arrays

作者： Jean-Baptiste Note Éric Rannaud École Normale Supérieure Paris France

ISBN: (纸本)9781595939340

This poster presents an in-depth analysis of the Xilinx bitstream file format. This theoretical analysis is backed by a simple and efficient implementation of a reverse-engineering tool for Xilinx bitstreams. The development process followed these lines. First, publicly available documentation from Xilinx has been analyzed; then some custom assumptions about the bitstream format have been made. This information allowed a suitable algorithm to be run on well-chosen bitstreams. The output from this automated analysis step is a database which relates raw bitstream data to low-level netlist elements. This database is subsequently used as input to an efficient bitstream compiler which can either generate a bitstream from a low-level (XDL) description of the netlist, or conversely decompile any given bitstream to its low-level netlist elements. This work has been validated for the spartan3, virtex2, virtex4 and virtex5 fpga lines from Xilinx. Decompiling a bitstream is very fast; it is two orders of magnitude faster than the reverse operation of compilation with Xilinx' bitgen. This work aims to raise awareness about security issues for users of fpgas. It also makes custom compilation and low-level tinkering with bitstreams - à la JBits - possible

关键词： bitstream format fpga reverse-engineering

来源：评论

学校读者我要写书评

暂无评论

Designing with extreme parallelism 08

Designing with extreme parallelism

引用

Proceedings of the 16th international acm/sigda symposium on field programmable gate arrays

作者： Guy Lemieux Tarek El-Ghazawi University of British Columbia Vancouver BC Canada George Washington University Washington WA

ISBN: (纸本)9781595939340

Modern fpgas can implement large, custom compute engines that are designed to exploit extreme amounts of parallel computation. Through parallelism, these systems achieve orders of magnitude higher performance than the fastest microprocessors. Building such custom compute engines with existing hardware design languages is too difficult and time-consuming. For this to become mainstream technology, the task of designing such parallel systems must be as simple as possible. Thus, high-level languages are needed which can specify a custom compute engine or be compiled to run on predesigned parallel systems. In this workshop, we will examine several approaches for specifying extremely parallel computations in high-level languages. These can be used to build parallel systems in fpgas, or they can be used to specify parallel computations in other competing architectures. By examining several different approaches, one gains insight into the best approach for solving a given problem. Ideally, this will also inspire new approaches for designing with extreme parallelism

关键词： custom compute engine high-level electronic design fpga hardware description language parallel processing reconfigurable computing

来源：评论

学校读者我要写书评

暂无评论

Efficient tiling patterns for reconfigurable gate arrays 08

Efficient tiling patterns for reconfigurable gate arrays

引用

Proceedings of the 16th international acm/sigda symposium on field programmable gate arrays

作者： Sumanta Chaudhuri Jean-Luc Danger Philippe Hoogvorst Sylvain Guilley ENST Paris France

ISBN: (纸本)9781595939340

This article does a purely mathematical analysis based on generic models, and the idea is to investigate the possibility of using tiling patterns other than Manhattan grid in fpgas. The goal of our research is to evolve fpga architectures with advances in technology, and specifically better utilization of available interconnect layers. We propose a method to evaluate tiling patterns based on the first principles ( i.e Rent's Rule, Donath's result, equivalence of wire flux and wire length). We show that, use of tiling patterns formed with higher order polygons can improve the speed and area performances of an fpga. This gain is highly dependent on depopulation schemes and other parameters. However for generic tiling patterns with crossbar switchboxes there is a 22% gain in area for the hexagonal tiling pattern, and a 30% gain in area for the octagonal tiling pattern. Moreover the average interconnect length is around 15% lesser for hexagonal and 31% lesser for the octagonal tiling compared to square tiling. We can expect a proportional increase in speed. We also present a comparative plot of total interconnect lengths for these tiling patterns and the hierarchical gate arraysThe physical realizability of these tiling patterns in CMOS are to be investigated. We present a layout scheme for both hexagonal and octagonal fpgas. To our knowledge standard processes support 45° metal lines, whereas 60° lines can be etched using non-standard processes. We must keep in mind, that in practice one must use some sort of depopulation and staggering scheme, and these results provide only an idea of gains that can be achieved. The actual interconnect structure is of course dependent on several factors (i.e available interconnect layers, difficulty of fabrication, required speed/area, evolution of CMOS technology etc). Our future research direction will be to choose an efficient interconnect strategy based on this and previous researches, as well as experimental results

关键词： tiling interconnect fpga hexagonal octagonal

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：