Comparison of five different 32-bit integer multipliers is done for various performance measures. Multipliers included in comparison are the array multiplier, modified Booth (radix-4) multiplier, optimized Wallace tre...
详细信息
ISBN:
(纸本)9643600572
Comparison of five different 32-bit integer multipliers is done for various performance measures. Multipliers included in comparison are the array multiplier, modified Booth (radix-4) multiplier, optimized Wallace tree multiplier, combined modified Booth-Wallace tree multiplier and twin pipe serial parallel multiplier. Comparison is based on synthesis results obtained by synthesizing all multiplier architectures towards FPGA.
Traditional signal direction of arrival (DOA) estimation algorithm has exist some problems, such as a large amount of calculation and slow convergence speed. In this paper, neural network method is used to improve the...
详细信息
Traditional signal direction of arrival (DOA) estimation algorithm has exist some problems, such as a large amount of calculation and slow convergence speed. In this paper, neural network method is used to improve the performance of DOA estimation. Due to the fact that BP neural network is inclined to be trapped in local minimum point, particle swarm optimization (PSO) algorithm is applied to optimization the weights and threshold. the model of DOA estimation based on PSO-BP neural network is constructed and trained. Simulation results show that, compared with classical RBFNN method and traditional MUSIC algorithm, the optimized BP neural network method has higher estimation accuracy and real-time performance.
Implementing image-processing systems can require significant effort and resources due to information volume and algorithm complexity. Model integrated computing (MIC) based image processing systems show promise in su...
详细信息
Implementing image-processing systems can require significant effort and resources due to information volume and algorithm complexity. Model integrated computing (MIC) based image processing systems show promise in supporting solutions of these complex problems. While MIC has contributed to the advancement of performing complex image processing tasks on parallel-embedded systems, it has not addressed a challenging class of algorithmsthat adapt the image-processing algorithm based on the information or state of the image processing system. this proposed effort addresses creating an adaptive image-processing environment based on MIC that allows solutions of complex image processing problems to be built and executed rapidly. this effort involves creating a new modeling representation for image processing adaptation mechanisms. the proposed MIC-based adaptive image-processing environment generates a solution given the modeling constraints and executes it on a number of hardware architectures.
Computer hardware is currently moving towards heavily parallelized architectures with multiprocessors, multicore and chip multithreaded designs. Cache memory, the fastest component of the memory hierarchy, adapts to t...
详细信息
ISBN:
(纸本)9781509039005
Computer hardware is currently moving towards heavily parallelized architectures with multiprocessors, multicore and chip multithreaded designs. Cache memory, the fastest component of the memory hierarchy, adapts to this new kind of parallel systems in order to provide the promised performance increase. Current cache designs have limitations that can be transformed into optimization opportunities both in hardware and software. this paper provides a detailed research of cache performance in multicore processors, considering critical hardware aspects. A new solution is proposed to improve the current performance: an optimized replacement policy for the shared cache level. From experiments run on four and eight core setups in a multicore simulator, the proposed enhancements achieve up to 30% execution speed increase over the default setup.
this paper quantitatively studies the trace effects to the performance and accuracy of the BigSim Emulator, a scalable parallel emulator for large-scale computers. To assess the accuracy effect we modify the emulator ...
详细信息
this paper quantitatively studies the trace effects to the performance and accuracy of the BigSim Emulator, a scalable parallel emulator for large-scale computers. To assess the accuracy effect we modify the emulator code to collect the predicted computation time. Four MPI programs with different computation to communication ratios are used as benchmarks. the emulation time and the predicted computation time, both when trace generation are enabled and disabled, are collected on two parallel host machines. the results show that although the BigSim Emulator only traces communication events and dependencies, trace generation still evidently degrades the emulation performance for programs with high communication to computation ratios. Trace generation also significantly affects the accuracy of the predicted computation time for communication intensive programs, which is an issue that can not be overlooked.
this paper presents a real-time stereo video processing system based on FPGA. the system takes rectification and histogram equalization as its pre-processing, and the depth detection of this system is using generalize...
详细信息
this paper presents a real-time stereo video processing system based on FPGA. the system takes rectification and histogram equalization as its pre-processing, and the depth detection of this system is using generalized census transform and block matching method. Withthe help of on-line generated projected pattern by the pattern controller inside FPGA, this system can be used in various environments. the median filter is used as the post-processing step of depth map. In comparison to the software solution method, this system takes the advantage of the parallel nature of FPGA and got higher speed in generating the depth map. therefore, it can be applied to the applications demanded for better performance.
Withthe rapid development of next-generation sequencing (NGS) technology, the ever-increasing biological sequence data poses a tremendous challenge to data processing. therefore, there is an urgent need for intensive...
详细信息
ISBN:
(数字)9781728109459
ISBN:
(纸本)9781728109466
Withthe rapid development of next-generation sequencing (NGS) technology, the ever-increasing biological sequence data poses a tremendous challenge to data processing. therefore, there is an urgent need for intensive computing power to speed up the data analysis process. Among the state-of-the-art parallel accelerators, Intel Xeon Phi coprocessor is a bootable host processor based on Intel Many Integrated Core (MIC) architecture that provides massive parallelism and vectorization to support the most demanding high-performance computing (HPC) applications. the underlying x86 architecture supports common parallel programming standard libraries that provide familiarity and flexibility to transplant existing code to heterogeneous computing environments. In addition, it delivers three usage model including native, offload and symmetric models to solve different application problems on the MIC-based neo-heterogeneous architectures. Currently, Intel Xeon Phi is becoming a common parallel computing platform for decreasing the computational cost of the most demanding processes in bioinformatics. To help researchers make better use of MIC, we reviewed the MIC-based bioinformatics applications, providing a comprehensive guideline for bioinformatics researchers to apply MIC in their own fields.
Multi-field packet classification is a critical function that enables network routers to support a variety of applications such as firewall processing, Quality of Service differentiation, traffic billing, and other va...
详细信息
ISBN:
(纸本)9781424445523
Multi-field packet classification is a critical function that enables network routers to support a variety of applications such as firewall processing, Quality of Service differentiation, traffic billing, and other value added services. Explosive growth of Internet traffic requires the future packet classifiers be implemented in hardware. However, most of the existing packet classification algorithms need large amount of memory, which inhibits efficient hardware implementations. this paper exploits the modern FPGA technology and presents a partitioning-based parallel architecture for scalable and high-speed packet classification. We propose a coarse-grained independent sets algorithm and then combine it seamlessly withthe cross-producting scheme. After partitioning the original rule set into several coarse-grained independent sets and applying the cross-producting scheme for the remaining rules, the memory requirement is dramatically reduced. Our FPGA implementation results show that our architecture can store 10K real-life rules in a single state-of-the-art FPGA while consuming a small amount of on-chip resources. Post place and route results show that the design sustains 90Gbps throughput for minimum size (40 bytes) packets, which is more than twice the current backbone network link rate.
In many real-world scenarios, relationships between two different entities can be naturally represented as bipartite graphs, such as author-paper, user-item, and people-location. Cohesive subgraph search, which aims t...
详细信息
the 3780-point FFT is a main component of the time domain synchronous OFDM (TDS-OFDM) system in the Chinese Digital Multimedia/TV Broadcasting-Terrestrial (DMB-T) national standard. In this paper, we proposed an effic...
详细信息
the 3780-point FFT is a main component of the time domain synchronous OFDM (TDS-OFDM) system in the Chinese Digital Multimedia/TV Broadcasting-Terrestrial (DMB-T) national standard. In this paper, we proposed an efficient implementation for 3780-point FFT on multi-core processors, which makes full use of the parallelism of FFT application and multi-core system. Experiment results demonstrate that multi-core 3780-point FFT implementation not only achieves a speedup ratio of 6.475 over single core implementation but also maintains a great flexibility compared with ASIC implement.
暂无评论