Due to their generic and highly programmable nature, fpgas provide the ability to implement a wide range of applications. However, it is this nonspecific nature that has limited the use of fpgas in scientific applicat...
详细信息
ISBN:
(纸本)1595932925
Due to their generic and highly programmable nature, fpgas provide the ability to implement a wide range of applications. However, it is this nonspecific nature that has limited the use of fpgas in scientific applications that require floating-point arithmetic. Even simple floating-point operations consume a large amount of computational resources. In this paper, we introduce embedding floating-point multiply-add units in an island style fpga. This has shown to have an average area savings of 55.0% and an average increase of 40.7% in clock rate over existing architectures. Copyright 2006 acm.
In this paper we present an implementation of a Cholesky decomposition core, with IEEE754 single precision arithmetic. The datapaths are generated using fused datapath synthesis, created with an experimental floating ...
详细信息
ISBN:
(纸本)9781605584102
In this paper we present an implementation of a Cholesky decomposition core, with IEEE754 single precision arithmetic. The datapaths are generated using fused datapath synthesis, created with an experimental floating point compiler tool, capable of fitting hundreds of floating point operators into a single device. We present a scalable architecture for both real and complex matrixes, on which we will report results for up to 128128 real matrices. The concepts of fused datapath synthesis for fpga floating point designs will be reviewed, and the application to the Cholesky algorithm detailed. Experimental results will be given to show that the accuracy of this method is superior to those expected from a traditional IEEE754 core based design flow. Copyright 2009 acm.
Reconfigurable computing can provide a significant speed-up factor to cryptographic and error correcting code algorithms. Finite field arithmetic is essential to both, but is difficult to implement efficiently. Finite...
详细信息
ISBN:
(纸本)9781595936004
Reconfigurable computing can provide a significant speed-up factor to cryptographic and error correcting code algorithms. Finite field arithmetic is essential to both, but is difficult to implement efficiently. Finite field instruction set extensions and a reconfiguration framework have been constructed to enable a finite field multiplier to be regenerated via software control. A performance evaluation has been created by generating a Finite field Extensions Unit with MicroBlaze processor in a Xilinx Virtex(2)Pro fpga. By utilizing the in-system partial reconfiguration capability, the finite field multiplier can be customized to a particular size and definition. With a customized GF(2(163)) multiplier, a speed-up factor of 1530x has been demonstrated versus execution of the same algorithm on the MicroBlaze processor alone.
Clock network power in field-programmablegatearrays (FP- ) is considered and two complementary approaches for power reduction in the Xilinx RVirtexTM-5 fpga are. The approaches are unique in that they lever- specifi...
详细信息
ISBN:
(纸本)9781605584102
Clock network power in field-programmablegatearrays (FP- ) is considered and two complementary approaches for power reduction in the Xilinx RVirtexTM-5 fpga are. The approaches are unique in that they lever- specific architectural aspects of Virtex-5 to achieve re- in dynamic power consumed by the clock network. first approach comprises a placement-based technique reduce interconnect resource usage on the clock network, reducing capacitance and power (up to 12%). The approach borrows the "clock gating" notion from the domain and applies it to fpgas. Clock enable sig- on flip-flops are selectively migrated to use the dedi- clock enable available on the fpga's built-in clock, leading to reduced toggling on the clock intercon- and lower power (up to 28%). Power reductions are achieved without any performance penalty, on average. Copyright 2009 acm.
With increasing interest in Cloud fpgas, such as Amazon's EC2 F1 instances or Microsoft's Azure with Catapult servers, fpgas in cloud computing infrastructures can become targets for information leakages via c...
详细信息
ISBN:
(纸本)9781450361378
With increasing interest in Cloud fpgas, such as Amazon's EC2 F1 instances or Microsoft's Azure with Catapult servers, fpgas in cloud computing infrastructures can become targets for information leakages via convert channel communication. Cloud fpgas leverage temporal sharing of the fpga resources between users. This paper shows that heat generated by one user can be observed by another user who later uses the same fpga. The covert data transfer can be achieved through simple on-off keying (OOK) and use of multiple fpga boards in parallel significantly improves data throughput. The new temporal thermal covert channel is demonstrated on Microsoft's Catapult servers with fpgas running remotely in the Texas Advanced Computing Center (TACC). A number of defenses against the new temporal thermal covert channel are presented at the end of the paper.
C-slow retiming is a process of automatically increasing the throughput of a design by enabling fine grained pipelining of problems with feedback loops. This transformation is especially appropriate when applied to FP...
详细信息
C-slow retiming is a process of automatically increasing the throughput of a design by enabling fine grained pipelining of problems with feedback loops. This transformation is especially appropriate when applied to fpga designs because of the large number of available registers. To demonstrate and evaluate the benefits of C-slow retiming, we constructed an automatic tool which modifies designs targeting the Xilinx Virtex family of fpgas. Applying our tool to three benchmarks: AES encryption. Smith/Waterman sequence matching, and the LEON 1 synthesized microprocessor core, we were able to substantially increase the total throughput. For some parameters, throughput is effectively doubled.
Current fpga placement algorithms estimate the routability of a placement using architecture-specific metrics. The shortcoming of using architecture-specific routability estimates is limited adaptability. A placement ...
详细信息
Current fpga placement algorithms estimate the routability of a placement using architecture-specific metrics. The shortcoming of using architecture-specific routability estimates is limited adaptability. A placement algorithm that is targeted to a class of architecturally similar fpgas may not be easily adapted to other architectures. The subject of this paper is the development of a routability-driven architecture adaptive fpga placement algorithm called Independence. The core of the Independence algorithm is a simultaneous place-and-route approach that tightly couples a simulated annealing placement algorithm with an architecture adaptive fpga router (Pathfinder). The results of our experiments demonstrate Independence's adaptability to island-style and hierarchical fpga architectures. The quality of the placements produced by Independence is within 5% of the quality of VPR's placements and 17% better than the placements produced by HSRA's place-and-route tool. Further, our results show that Independence produces clearly superior placements on routing-poor island-style fpga architectures.
Three factors are driving the demand for rapid fieldprogrammablegate array (fpga) compilation. First, as fpgas grown in logic capacity, the compile computation grows more quickly than the compute power of the availa...
详细信息
Three factors are driving the demand for rapid fieldprogrammablegate array (fpga) compilation. First, as fpgas grown in logic capacity, the compile computation grows more quickly than the compute power of the available computers. Second, there exists a subset of users who are willing to pay for very high speed compile with a decrease in quality of result. Third, very high speed compile is a long-standing desire of those using fpga-based custom computing machines, as they want compile times at least closer to those of regular computers. A routing algorithm and routing tool that relates these three unique capabilities to very high-speed compile is presented.
fpga technology has become widely used for real-time network intrusion detection. In this paper, a novel packet classification architecture called BV-TCAM is presented, which is implemented for an fpga-based Network I...
详细信息
ISBN:
(纸本)9781595930293
fpga technology has become widely used for real-time network intrusion detection. In this paper, a novel packet classification architecture called BV-TCAM is presented, which is implemented for an fpga-based Network Intrusion Detection System (NIDS). The classifier can report multiple matches at gigabit per second network link rates. The BV-TCAM architecture combines the Ternary Content Addressable Memory (TCAM) and the Bit Vector (BV) algorithm to effectively compress the data representations and boost throughput. A tree-bitmap implementation of the BV algorithm is used for source and destination port lookup while a TCAM performs the lookup of the other header fields, which can be represented as a prefix or exact value. The architecture eliminates the requirement for prefix expansion of port ranges. With the aid of a small embedded TCAM, packet classification can be implemented in a relatively small part of the available logic of an fpga. The design is prototyped and evaluated in a Xilinx fpga XCV2000E on the FPX platform. Even with the most difficult set of rules and packet inputs, the circuit is fast enough to sustain OC48 traffic throughput. Using larger and faster fpgas, the system can work at speeds greater than OC192. Copyright 2005 acm.
This paper presents a new universal test approach for fpga logic resources. It includes a new greedy configuration-generating algorithm, and a new fpga Configurable Logic Block (CLB) test model. The model is based on ...
详细信息
This paper presents a new universal test approach for fpga logic resources. It includes a new greedy configuration-generating algorithm, and a new fpga Configurable Logic Block (CLB) test model. The model is based on two directed graphs: a structure graph and a configuration graph, which convey the important information from the CLB gate level circuit to the greedy configuration- generating algorithm, so the algorithm can generate minimum the number of test configurations to achieve a given fault coverage. With this new approach, researchers can easily get test patterns optimized both in test time and fault coverage for different fpga architectures. At the end, we compare experiment results with other test approaches, and the results show test pattern from the new approach is even more efficient than pattern from manual optimization. It also proves that the approach can deal with different types of fpgas very well.
暂无评论