FPGA CAD tools require wirelength predictions to make informed decisions through clustering, placement and routing stages towards power, area or delay based design goals. Unfortunately, there has been minimal work dev...
详细信息
ISBN:
(纸本)9781424410590
FPGA CAD tools require wirelength predictions to make informed decisions through clustering, placement and routing stages towards power, area or delay based design goals. Unfortunately, there has been minimal work devoted to estimating individual wirelengths early in the CAD flow. Rent's rule can be used to generate a wirelength distribution but cannot be used to predict lengths of individual wires. Hence, this paper explores "structural metrics" that have been found to possess strong predictive qualities in the ASIC domain. To our knowledge this is a first study in the application of these metrics in the FPGA CAD flow. Results show that the studied metrics capture characteristics of placement optimization carried out by VPR, and hence, are good indicators of post-placement wirelengths.
Reconfiaurable logic Devices are classified as the fine-grained or coarse-rained type on the basis of their basic logic cell architecture. In general, each architecture has its own merit;therefore, it is difficult to ...
详细信息
ISBN:
(纸本)9781424410590
Reconfiaurable logic Devices are classified as the fine-grained or coarse-rained type on the basis of their basic logic cell architecture. In general, each architecture has its own merit;therefore, it is difficult to achieve a balance between the operation speed and implementation area in various applications. In this paper, we propose a Variable Grain logic Cell (VGLC) architecture, which consists of a 4-bit ripple carry adder with configuration memory bits and also develop technology mapping tool. Its key feature is the variable granularity being a trade-off between coarse-grained and fine-grained types required for the implementation arithmetic and random logic, respectively. As a result, critical path delay, and number of configuration memory bits are reduced by 49.7%, and 48.5%, respectively, in the benchmark circuits.
We demonstrate a hybrid reconfigurable cluster-on-chip architecture with a cross-platform Message Passing Interface (MPI), a cross-platform parallel image processing library and a sample application. We describe the s...
详细信息
ISBN:
(纸本)9781424410590
We demonstrate a hybrid reconfigurable cluster-on-chip architecture with a cross-platform Message Passing Interface (MPI), a cross-platform parallel image processing library and a sample application. We describe the system, network architecture, MPI library and the parallel image processing library implementations. We validate the performance, scalability and suitability of MPI as a software interface to enable cross-platform application parallelism on reconfigurable hybrid cluster-on-chip systems and desktop cluster systems. the presented results are promising, showing the suitability, scalability and performance of parallelisation of image processing algorithms with a cross-platform MPI implementation.
this paper develops a formal model of process migration that describes pro.-rams, processes, and the migration of those processes within a migration realm. A migration realm is a group of processors modeled as finite ...
详细信息
ISBN:
(纸本)9781424410590
this paper develops a formal model of process migration that describes pro.-rams, processes, and the migration of those processes within a migration realm. A migration realm is a group of processors modeled as finite state machines. the model is motivated by a migration application between software and fieldprogrammable Gate Array (FPGA) hardware, and the theorems of the model guide the use of FPGA resources while guaranteeing complete and correct execution of a process. By defining different types of migration realms this paper also develops a migration realm taxonomy.
A new scalable systolic hardware architecture for RSA cryptosystems is presented. the kernel of the architecture can operate with different precision of inputs which enables making area-time tradeoff in design. the ad...
详细信息
ISBN:
(纸本)9781424410590
A new scalable systolic hardware architecture for RSA cryptosystems is presented. the kernel of the architecture can operate with different precision of inputs which enables making area-time tradeoff in design. the add-shift Montgomery algorithm is used for modular multiplication. Unlike previous approaches after add operation, the result is shifted to the previous systole to divide by radix. this simplifies the structure of processing elements. the R-L binary Montgomery exponentiation algorithm is used. the square and multiply operations are performed in parallel. the architecture is implemented in Xilinx Virtex-5 FPGA (fieldprogrammable Gate Array) chips for different radixes. the DSP48E slices in the FPGA chips are used to increase the throughput of the design. the results are compared withthe literature. It is seen that the highest performance per area is obtained withthe Radix-2(16) design.
A method is described for enumerating the frequencies of DNA subsequences on a system comprising a host computer and a fieldprogrammable gate array (FPGA) board with one FPGA. Frequencies of subsequences with lengths...
详细信息
ISBN:
(纸本)9781424410590
A method is described for enumerating the frequencies of DNA subsequences on a system comprising a host computer and a fieldprogrammable gate array (FPGA) board with one FPGA. Frequencies of subsequences with lengths of up to K-0 K-1 K-2 (24 in the current implementation) are enumerated in three phases. In these three phases, subsequences with lengths of up to K-0, K (0) K-1, and K-0 K-1 K-2, respectively, are enumerated;these three phases are executed simultaneously on a pipelined circuit, resulting in high performance. the enumeration of frequent subsequences in databases, which are becoming larger and larger, will enable subsequences that are unique and/or repeatedly used in many parts of the sequences to be found.
In this paper we present a new hardware design pattern for improving memory transfers to external dynamic memory in Altera's SOPC-builder tool by reusing the standard DMA IP core for all bulk memory transfers with...
详细信息
ISBN:
(纸本)9781424410590
In this paper we present a new hardware design pattern for improving memory transfers to external dynamic memory in Altera's SOPC-builder tool by reusing the standard DMA IP core for all bulk memory transfers without the need for a CPU. the presented approach doubles the data throughput without the need for extra system resources. In addition it is more effective for choosing optimal clock settings for the different components of the system on a programmable chip. the benefits and limitations of this new approach are illustrated with a real world example: a bitplane assembler for scalable wavelet based video. the new design is. times faster withthe same clock settings as the original design and uses about 100 logic elements less. Applying our new approach also has a positive impact on energy consumption.
this paper presents and discusses implementation of a barotropic operator used in ocean model simulation called Parallel Ocean Program (POP) using SRC-6 MAP. While a lot of high-end reconfigurable machines on which us...
详细信息
ISBN:
(纸本)9781424410590
this paper presents and discusses implementation of a barotropic operator used in ocean model simulation called Parallel Ocean Program (POP) using SRC-6 MAP. While a lot of high-end reconfigurable machines on which users can implement applications with a programming language are now available, enough implementation experience has not been accumulated for practical applications. In this paper, several implementation techniques accompanied by modification on original application source code are empirically evaluated and analyzed. the results show that appropriate use of internal memory and streaming DMA make 100 MHz FPGAs achieve comparative performance with GHz processors by using 100 MHz FPGAs.
Identifying and locating objects in images and videos, including elements like traffic signs, vehicles, buildings, and people, constitutes a fundamental and demanding task in computer vision, known as object detection...
详细信息
In this paper, we propose a first step towards a time predictable computer architecture for single-chip multiprocessing (CMP). CMP is the actual trend in server and desktop systems. CMP is even considered for embedded...
详细信息
ISBN:
(纸本)9781424410590
In this paper, we propose a first step towards a time predictable computer architecture for single-chip multiprocessing (CMP). CMP is the actual trend in server and desktop systems. CMP is even considered for embedded realtime systems, where worst-case execution time (WCET) estimates are of primary importance. We attack the problem of WCET analysis for several processing units accessing a shared resource (the main memory) by support from the hardware. In this paper, we combine a time predictable Java processor and a direct memory access (DMA) unit with a regular access pattern (VGA controller). We analyze and evaluate different arbitration schemes with respect to schedulability analysis and WCET analysis. We also implement the various combinations in an FPGA. An FPGA is the ideal platform to verify the different concepts and evaluate the results by running applications with industrial background in real hardware.
暂无评论