Kyber is a promising lattice-based post-quantum cryptography (PQC) for key encapsulation mechanisms. Since number theoretic transform (NTT) is the most computationally expensive operation in Kyber, this paper focuses ...
详细信息
Kyber is a promising lattice-based post-quantum cryptography (PQC) for key encapsulation mechanisms. Since number theoretic transform (NTT) is the most computationally expensive operation in Kyber, this paper focuses on novel optimization techniques for efficient hardware implementation of NTT for Kyber. Benefiting from the proposed fast modular multiplication method and a doubled bandwidth ping-pong memory access scheme, our NTT architecture can complete an NTT operation in Kyber in 490 cycles using only 609 LUTs, 640 FFs, 2 DSPs on a Xlinx Artix-7 fpga. The proposed NTT architecture is 3.95 times faster than the state-of-the-art design for Kyber while achieves an improvement of more than 1.5 times in the area time product (ATP). Compared with the state-of-the-art NTT designs for other algorithms, our NTT architecture reduces 24% FFs and 50% DSPs and ranks second smallest in ATPs, which can also confirm the high efficiency of our design.
Deep Neural Networks (DNNs) have achieved extraordinary performance in various application domains. To support diverse DNN models, efficient implementations of DNN inference on edge-computing platforms, e.g., ASICs, F...
详细信息
Deep Neural Networks (DNNs) have achieved extraordinary performance in various application domains. To support diverse DNN models, efficient implementations of DNN inference on edge-computing platforms, e.g., ASICs, fpgas, and embedded systems, are extensively investigated. Due to the huge model size and computation amount, model compression is a critical step to deploy DNN models on edge devices. This paper focuses on weight quantization, a hardware-friendly model compression approach that is complementary to weight *** existing methods that use the same quantization scheme for all weights, we propose the first solution that applies different quantization schemes for different rows of the weight matrix. It is motivated by (1) the distribution of the weights in the different rows are not the same; and (2) the potential of achieving better utilization of heterogeneous fpga hardware resources. To achieve that, we first propose a hardware-friendly quantization scheme named sum-of-power-of-2 (SP2) suitable for Gaussian-like weight distribution, in which the multiplication arithmetic can be replaced with logic shifter and adder, thereby enabling highly efficient implementations with the fpga LUT resources. In contrast, the existing fixed-point quantization is suitable for Uniform-like weight distribution and can be implemented efficiently by DSP. Then to fully explore the resources, we propose an fpga-centric mixed scheme quantization (MSQ) with an ensemble of the proposed SP2 and the fixed-point schemes. Combining the two schemes can maintain, or even increase accuracy due to better matching with weight *** the fpga implementations, we develop a parameterized architecture with heterogeneous Generalized Matrix Multiplication (GEMM) cores-one using LUTs for computations with SP2 quantized weights and the other utilizing DSPs for fixed-point quantized weights. Given the partition ratio among the two schemes based on resource characterization, MSQ q
With the expansion of cloud services and privacy security becomes more crucial, fully homomorphic encryption (FHE) scheme which operates data in ciphertext domain has been widely concerned. Lattice-based cryptography ...
详细信息
暂无评论