Fourier Transform is one of the most critical algorithms, and is applied in a wide range of fields like signal processing and data compression. In real world applications, such as image compression (JPEG), Fourier Tra...
详细信息
ISBN:
(纸本)9783030050542;9783030050535
Fourier Transform is one of the most critical algorithms, and is applied in a wide range of fields like signal processing and data compression. In real world applications, such as image compression (JPEG), Fourier Transform is concentrated in processing real number input. These transforms are called real DFT (real discrete fourier transform) in this paper. Thus it is critical to optimize real DFT for specific platforms. In this paper, we implement 1D and 2D real DFT on ARMv8 platform which is the flagship architecture of ARM. Real DFT kinds implemented and optimized include R2HC, HC2R, DHT, DCTI-IV, DSTI-IV and are especially optimized when input size is 2(q)3(n)5(m). In order to achieve highperformance, optimization is carried out in following aspects: (1) Reduction of the computation complexity of real DFT. (2) Implementation of highperformance 1D complex DFT algorithm to support real DFT. (3) For the 2D real DFT, we propose a cache-aware blocking approach to improve cache performance. Experimental results show that: Compared with FFTw 3.3.7, 1D-Float DFT gains 1.52x speedup in average across all real DFT kinds, maximum speedup reaches 1.79x;1D-Double DFT gains 1.34x speedup in average across all real DFT kinds, maximum speedup reaches 1.61x;2D-Float DFT gains 1.41x speedup in average across all real DFT kinds, maximum speedup reaches 1.70x;2D-Double DFT gains 1.10x speedup across all real DFT kinds, maximum speedup reaches 1.25x.
Impurity-induced disordering (IID) in vertical-cavity surface-emitting lasers (VCSELs) has been shown to provide enhanced performance, such as achieving single fundamental-mode operation with higher output powers when...
详细信息
With the development of virtualization technology, desktop virtualization is becoming more and more mature. Desktop virtualization can effectively isolate the use of users and the management of the system, but in the ...
详细信息
Knowledge compilation as part of the Weighted Model Counting approach has proven to be an efficient tool for exact inference in probabilistic graphical models, by exploiting structures that more traditional methods ca...
详细信息
In this paper, we propose a novel model which exploits the topic relevance to enhance the word embedding learning. We attempt to leverage the hidden topic-bigram model to build topic relevance matrices, then learn the...
详细信息
This paper presents a new millimeter-wave sensing system used for near-field imaging applications. The presented system is composed of a parabolic reflector which is fed by an array of compressive antennas. A Compress...
详细信息
We investigate implementation of lattice Quantum Chromodynamics (QCD) code on the Intel AVX-512 architecture. The most time consuming part of the numerical simulations of lattice QCD is a solver of linear equation for...
详细信息
ISBN:
(纸本)9783319951683;9783319951676
We investigate implementation of lattice Quantum Chromodynamics (QCD) code on the Intel AVX-512 architecture. The most time consuming part of the numerical simulations of lattice QCD is a solver of linear equation for a large sparse matrix that represents the strong interaction among quarks. To establish widely applicable prescriptions, we examine rather general methods for the SIMD architecture of AVX-512, such as using intrinsics and manual prefetching, for the matrix multiplication. Based on experience on the Oakforest-PACS system, a large scale cluster composed of Intel Xeon Phi Knights Landing, we discuss the performance tuning exploiting AVX-512 and code design on the SIMD architecture and massively parallel machines. We observe that the same code runs efficiently on an Intel Xeon Skylake-SP machine.
Silicon heterojunction solar cells are primarily fabricated with high-quality wafers, resulting in a higher manufacturing cost than mainstream solar cells. We explore the impact of defect engineering methods of hydrog...
详细信息
A number of differences have emerged between modern and classic approaches to constituency parsing in recent years, with structural components like grammars and featurerich lexicons becoming less central while recurre...
详细信息
The model produced by 3D reconstruction algorithm is usually represented by voxels. The management of these voxels is usually divided into two categories: ordered and unordered methods. The ordered method holds too ma...
详细信息
暂无评论