the structure of the rocket-borne model is inherently complex, with processed images exhibiting high resolution and generating substantial amounts of data and calculations. Achieving robust real-time computing on an e...
详细信息
ISBN:
(数字)9798331531881
ISBN:
(纸本)9798331531898
the structure of the rocket-borne model is inherently complex, with processed images exhibiting high resolution and generating substantial amounts of data and calculations. Achieving robust real-time computing on an embedded platform poses significant challenges due to strictly limited resources, power consumption constraints, and size limitations. Our review of rocket-borne applications reveals considerable variability in the design resources of different devices, indicating a need for expanded design approaches. Upon evaluating existing methods, we identified two primary drawbacks. First, certain operators within the high-resolution target detection model are difficult to parallelize, resulting in significant inference delays that hinder the ability to meet task requirements. Although existing methods have been extended, there remains significant potential for performance enhancement in core scheduling for poor acceleration. this paper proposes an optimized architecture for the target detection algorithm accelerator designed for high-resolution images, along with a novel highly parallel data pre-processing and post-processing module implemented on FPGA to address these issues. Compared to the ARM implementation, this architecture demonstrates an improved performance of 24.64x. Furthermore, to ensure flexible application across various rocket launch scenarios, we introduce an optimization structure for convolution, pooling, and fusion operators and a multi-core expansion optimization method. this approach yields a 1.29x improvement in computing unit utilization compared to state-of-the-art multi-core scaling efforts. Finally, we assessed the accelerator architecture across multiple FPGA platforms, achieving a peak processing element utilization rate of 99.71% for a single core and layer. the overall computing efficiency, excluding the first layer, exceeded 90%. the peak computing power for the four cores reached 1638.4 GOPS, and the end-to-end computation time for
this paper introduces a new flow able to fit a parallel application onto an FPGA according to the FPGA characteristics such as computing power and IOs. the flow is based on iterative refactoring and transformations of...
详细信息
ISBN:
(纸本)9781424410590
this paper introduces a new flow able to fit a parallel application onto an FPGA according to the FPGA characteristics such as computing power and IOs. the flow is based on iterative refactoring and transformations of the application. From the resulting application, a VHDL code is generated. this code is finally used to simulate or synthesize the application. Significant experiments have validated the approach.
A novel routing fabric is introduced that offers high flexibility at significant lower silicon cost compared to routing fabrics currently incorporated in many fieldprogrammable Gate Array (FPGA) devices, IP cores, an...
详细信息
ISBN:
(纸本)9781424410590
A novel routing fabric is introduced that offers high flexibility at significant lower silicon cost compared to routing fabrics currently incorporated in many fieldprogrammable Gate Array (FPGA) devices, IP cores, and IP-core wrappers. the novel fabric is entirely constructed from multiplexers and unidirectional point-to-point connections, controlled by configuration bits, and proves very efficient when mapping applications. For a fabric connecting 4-input Look-Up-Tables, area savings of 60% are demonstrated when routing applications from the MCNC benchmark set.
this paper describes an execution cache that uses process migration between hardware and software contexts by way of run-time reconfiguration (RTR) of fieldprogrammable Gate Arrays (FPGAs). the feasibility of such a ...
详细信息
ISBN:
(纸本)9781424410590
this paper describes an execution cache that uses process migration between hardware and software contexts by way of run-time reconfiguration (RTR) of fieldprogrammable Gate Arrays (FPGAs). the feasibility of such a system is demonstrated using existing FPGAs by accelerating a cycle-based simulation of a Register Transfer Level (RTL) design description. through the use of a common instruction set, each simulation process may be run in a software Virtual Machine (VM) or in a hardware Real Machine (RM). the implementation provides data for an empirical model used to examine the behavior of unimplemented parts of the system.
this paper presents an architecture for the computation of the atan(Y/X) operation suitable for broadband communication applications where a throughput of 20 MHz is required. the architecture is based on LUT methods a...
详细信息
ISBN:
(纸本)9781424410590
this paper presents an architecture for the computation of the atan(Y/X) operation suitable for broadband communication applications where a throughput of 20 MHz is required. the architecture is based on LUT methods and achieves lower power consumption with respect to an atan(Y/X) operator based on CORDIC algorithm with a lower latency. the proposed architecture can compute the atan(Y/X) with a latency of two clock cycles and its power consumption is 49% lower than a CORDIC withthe same latency.
FPGA is currently a very important design technology to implement electronic systems due to its high logic density, its fast time-to-market and its low cost. But in order to provide high logic density FPGA devices are...
详细信息
ISBN:
(纸本)9781424410590
FPGA is currently a very important design technology to implement electronic systems due to its high logic density, its fast time-to-market and its low cost. But in order to provide high logic density FPGA devices are fabricated with nanometer CMOS technology that is becoming susceptible to radiation-induced soft errors. Among these errors, single-event transients (SETs) are those that are induced in the user's programmablelogic. this paper presents a new fast adder, called RIC (Re-computing the Inverse Carry-in) and shows how this new adder architecture may be used to build SET-tolerant fast adders. Results considering FPGA-based implementation are presented.
QR decomposition, especially through the means of Householder transformation, is often used to solve least squares problems. A matrix to be decomposed withthis method is usually very large, often large enough that it...
详细信息
ISBN:
(纸本)9781424410590
QR decomposition, especially through the means of Householder transformation, is often used to solve least squares problems. A matrix to be decomposed withthis method is usually very large, often large enough that it is not able to fit into the main memory of a workstation, let alone the internal memory of an FPGA nowadays. Efficient out-of-core algorithms have been developed to address the factorization of large matrices. this paper describes the application of variants of Householder QR decomposition on FPGA-based systems. More specifically, issues on applying out-of-core algorithms to the relatively small internal memory architecture of FPGA's are investigated.
A high speed FPGA off-loading engine for detecting the license plate itself in order to avoid the traffic accident is proposed. A complicated algorithm is written in Handel-C, and parallel processing is explicitly uti...
详细信息
ISBN:
(纸本)9781424410590
A high speed FPGA off-loading engine for detecting the license plate itself in order to avoid the traffic accident is proposed. A complicated algorithm is written in Handel-C, and parallel processing is explicitly utilized in every level of implementation;an input image is segmented into 16 areas, and each area is processed in parallel by a multiple calculation unit executing pipeline processing and a distributed memory module. A prototype circuit implemented on a general purpose FPGA board achieved 4.16 times performance as software execution on a Pentium-III desktop PC. the highest performance in literature;100 frames per second;can be achieved.
this paper discusses the need for new high-speed hardware architectures for future networks and in particular the need for high speed, high capacity shared buffer designs. An implementation of such a buffer using FPGA...
详细信息
ISBN:
(纸本)9781424410590
this paper discusses the need for new high-speed hardware architectures for future networks and in particular the need for high speed, high capacity shared buffer designs. An implementation of such a buffer using FPGA technology utilizing RLDRAM II is presented. the architecture that has been derived and implemented operated at 12.8Gbps and is scalable up to 20Gbps.
暂无评论