The N-Body simulation process describes the evolution of a system of forces composed of N bodies, which may represent celestial objects, molecules, and so on. The most accurate algorithm for N-Body simulation, the All...
详细信息
ISBN:
(纸本)9781538674796
The N-Body simulation process describes the evolution of a system of forces composed of N bodies, which may represent celestial objects, molecules, and so on. The most accurate algorithm for N-Body simulation, the All-Pairs method, is particularly compute intensive and software implementations on CPUs are inefficient in terms of performance and power consumption. An implementation on a hardware accelerator, such as an FPGA, would benefits in both these terms, exploiting a parallel execution at a relative low power profile. Moreover, it would also benefit faster methods with lower computational complexity, since many of them rely on the All-Pairs approach to approximate the calculation of forces. This work proposes a highly scalable, power efficient and high performance hardware architecture for the N-Body All-Pairs simulation problem. Our final implementation is able to scale up to systems with an arbitrary number of bodies thanks to a tiling approach that allows performance in the order of 13,441 MPairs/s, outperforming state of the art implementations on FPGA in terms of both pure performance, as well as performance per watt ratio. Finally, our design results to be more power efficient than Grape-8 ASIC.
Satellites are crucial for the modern world to function properly as they provide Global Navigation Satellite System (GNSS) and global communication. However, the data that is stored on these satellites can be corrupte...
详细信息
Satellites are crucial for the modern world to function properly as they provide Global Navigation Satellite System (GNSS) and global communication. However, the data that is stored on these satellites can be corrupted by the radiation found in space, and its bits can be improperly flipped. In the past, Forward Error Correction (FEC) algorithms were selected based on their strength and implemented to correct these bit flips back to their original values. This thesis seeks to determine if the strength of the FEC algorithms Reed Solomon (RS) code and Reed Solomon Product Code (RSPC) directly translates to their effectiveness. These algorithms were coded and tested in Matrix Laboratory (MATLAB) and on a field programmable gate array (FPGA) under controlled parameters, including the data set sizes, number of bit flips introduced, and the distribution of the bit flips within the data set. From the experiment's results, these other factors significantly influenced the effectiveness of the algorithms as well. Knowing what factors influence the algorithm's effectiveness enable better decision making as to which FEC algorithm to use for a given set of circumstances. The RS codes should be used if the size of the data set is small enough for a single-instance RS code and the range of expected bit flips is narrow and lower than the code's correctable limit. If the data set is large or the range of expected bit flips varies widely and surpasses the RS code's correctable limit, the RSPC should be used for a higher overall success rate in exchange for a lower number of bit flips with a 100% correction rate.
Objectives: In this study, an infection screening system was developed to detect patients suffering from infectious diseases. In addition, the system was also designed to deal with the variability in age and gender, w...
详细信息
The authors present a novel approach of using reconfigurable fabric to accelerate a face detection algorithm based on the Haar classifier. With highly pipelined architecture and utilising abundant parallel arithmetic ...
详细信息
The authors present a novel approach of using reconfigurable fabric to accelerate a face detection algorithm based on the Haar classifier. With highly pipelined architecture and utilising abundant parallel arithmetic units in FPGA, the authors have achieved real-time performance of face detection with very high detection rate and low false positives. The 1-classifier and 16-classifier realisations in an accelerator provide 10x and 72x speedups, respectively, over the software counterpart. Moreover, the authors', approach is scalable towards the resources available on FPGA and it will gain more momentum as the Geneseo Initiative is introduced in the market. This work also provides an understanding of using the reconfigurable fabric for accelerating non-systolic-based vision algorithms.
Power quality disturbances (PQD) in electric distribution systems can be produced by the utilization of non-linear loads or environmental circumstances, causing electrical equipment malfunction and reduction of its us...
详细信息
Power quality disturbances (PQD) in electric distribution systems can be produced by the utilization of non-linear loads or environmental circumstances, causing electrical equipment malfunction and reduction of its useful life. Detecting and classifying different PQDs implies great efforts in planning and structuring the monitoring system. The main disadvantage of most works in the literature is that they treat a limited number of electrical disturbances through personal computer (PC)-based computation techniques, which makes it difficult to perform an online PQD classification. In this work, the novel contribution is a methodology for PQD recognition and classification through discrete wavelet transform, mathematical morphology, decomposition of singular values, and statistical analysis. Furthermore, the timely and reliable classification of different disturbances is necessary;hence, a field programmable gate array (FPGA)-based integrated circuit is developed to offer a portable hardware processing unit to perform fast, online PQD classification. The obtained numerical and experimental results demonstrate that the proposed method guarantees high effectiveness during online PQD detection and classification of real voltage/current signals.
A wideband digital transmitting beamformer based on linear frequency modulation (LFM) signals is presented in this study. The wideband beamformer is realised as a combination of direct digital synthesisers and fractio...
详细信息
A wideband digital transmitting beamformer based on linear frequency modulation (LFM) signals is presented in this study. The wideband beamformer is realised as a combination of direct digital synthesisers and fractional delay (FD) filters in polyphase structure. By using coordinate rotation digital computer algorithm, high intermediate frequency wideband LFM signal is generated and phase compensation for beamforming is accomplished in field programmable gate array. The impact of different number of quantisation bits on signal generation is analysed. The results of waveform generation and FD filter design are given. At last, the transmitting beam pattern is simulated.
In this paper, a robust speech recognition system for the recognition of the speeches subjected to environmental noise is designed and implemented on FPGA to control a home service robot wirelessly. An empirical mode ...
详细信息
In this paper, a robust speech recognition system for the recognition of the speeches subjected to environmental noise is designed and implemented on FPGA to control a home service robot wirelessly. An empirical mode decomposition is used to separate the clean speeches from the speech signals contaminated by environmental noise. To improve the recognition speed, instead of continuous hidden Markov model (CHMM), Discrete HMM (DHMM) is used here to reduce the computation load during speech recognition. However, to compensate the decreased speech recognition rate using DHMM, this paper uses fuzzy vector quantization (FVQ) on the modeling of DHMM to improve the speech recognition rates. It will be shown that the computation time just increases a little, while the speech recognition rates increase much when the FVQ is applied. Finally, combining a wireless module, a FPGA-based speech recognition system is designed to control the motions of a home service robot wirelessly via speech commands under some environmental noises. The performance of the designed system will be demonstrated in the end of this paper.
The random number generators are used in many areas such as cryptography, the applications where the Monte-Carlo method is used, the application of numerical analysis with computer simulations and modeling. TRNGs that...
详细信息
The random number generators are used in many areas such as cryptography, the applications where the Monte-Carlo method is used, the application of numerical analysis with computer simulations and modeling. TRNGs that is used in the field of cryptography and secure communications require fast, secure and intensive process of the physical methods that do not have deterministic character are used as entropy source. These methods are direct reinforcement, dual oscillator and chaos-based applications. In recent years the great efforts are being made in the area of developing the chaos-based TRNG structures due to noise-like features and the ability of hiding informatory sign of chaotic oscillators. Chaos based TRNG's within the digital circuits are an effective alternative to the traditional chaos-based analog structure. Because TRNG systems that use analog chaotic signal generator are difficult to be synchronized with the transmitter and receiver. In addition, the weak resources that generates physical noises like thermal or scattering are used on the implementation of these circuits. The digital-based FPGA chips have a significant potential in improving the information security capabilities in some applications as cryptology and securing the communication which requires high performance and processor power. In this study, performance differences between conventional method of TRNG that used chaotic system and recently designed FPGA based chaotic systems have been compered.
Vortex dynamics and aeroacoustic energy transfer, which play essential roles in vortex-excited acoustic resonance inside straight channels with coaxial side-branches, were investigated by phase-locked particle image v...
详细信息
Vortex dynamics and aeroacoustic energy transfer, which play essential roles in vortex-excited acoustic resonance inside straight channels with coaxial side-branches, were investigated by phase-locked particle image velocimetry (PIV) and Howe's acoustic analogy. In the experiments, the periodic acoustic pressure fluctuations at the endplates of the side branches were used to trigger PIV via a field-programmablegatearray control system. The results revealed that the spatiotemporal evolution of vortex shedding can be classified into three regions in response to the acoustic standing-wave propagations: the formation region, the convection region, and the collapse region, along with the flapping recirculation zone and the intermittent vertical flow streaks that occur inside the side branches. Further investigation was performed in terms of phase-dependent quantities such as the shear and normal stresses;the normal stress production, which was attributed to the evolution of vortex shedding, was found to be the major contributor to the kinematics and energetics of the self-sustained flow. Finally, Howe's acoustic analogy was used to determine the instantaneous acoustic power and the accumulated aeroacoustic energy during one acoustic resonance cycle. The aeroacoustic energy extracted from the acoustic standing-wave propagations contributed to the formation and subsequent growth of the shedding vortex, whereas the decreased turbulent kinetic energy of the shedding vortex was transferred to the acoustic standing waves to maintain the longitudinal wave propagations. Published by AIP Publishing.
Optimizing for routability during FPGA placement is becoming increasingly important, as failure to spread and resolve congestion hotspots throughout the chip, especially in the case of large designs, may result in pla...
详细信息
Optimizing for routability during FPGA placement is becoming increasingly important, as failure to spread and resolve congestion hotspots throughout the chip, especially in the case of large designs, may result in placements that either cannot be routed or that require the router to work excessively hard to obtain success. In this article, we introduce a new, analytic routability-aware placement algorithm for Xilinx UltraScale FPGA architectures. The proposed algorithm, called GPlace3.0, seeks to optimize both wirelength and routability. Our work contains several unique features including a novel window-based procedure for satisfying legality constraints in lieu of packing, an accurate congestion estimation method based on modifications to the pathfinder global router, and a novel detailed placement algorithm that optimizes both wirelength and external pin count. Experimental results show that compared to the top three winners at the recent ISPD'16 FPGA placement contest, GPlace3.0 is able to achieve (on average) a 7.53%, 15.15%, and 33.50% reduction in routed wirelength, respectively, while requiring less overall runtime. As well, an additional 360 benchmarks were provided directly from Xilinx Inc. These benchmarks were used to compare GPlace3.0 to the most recently improved versions of the first- and second-place contest winners. Subsequent experimental results show that GPlace3.0 is able to outperform the improved placers in a variety of areas including number of best solutions found, fewest number of benchmarks that cannot be routed, runtime required to perform placement, and runtime required to perform routing.
暂无评论