New formulae for point addition and point doubling on elliptic curves over prime fields are *** on these formulae, an improved montgomery algorithm is proposed. The theoretical analysis indicates that it is about 13.4...
详细信息
New formulae for point addition and point doubling on elliptic curves over prime fields are *** on these formulae, an improved montgomery algorithm is proposed. The theoretical analysis indicates that it is about 13.4% faster than Brier and Joye’s montgomery algorithm. Experiments on the elliptic curve over a 256-bit prime field recommended by the National Institute of Standards and Technology and over a 256-bit prime field in Chinese elliptic curve standard SM2 support the theoretical analysis.
作者:
Wu, TaoSun Yat Sen Univ
Shenzhen Res Inst Yuehai Rd Shenzhen 518057 Guangdong Peoples R China
Elliptic curve cryptography is the second most important public-key cryptography following RSA cryptography. The fundamental arithmetic of elliptic curve cryptography is a series of modular multiplications and modular...
详细信息
Elliptic curve cryptography is the second most important public-key cryptography following RSA cryptography. The fundamental arithmetic of elliptic curve cryptography is a series of modular multiplications and modular additions. Usually, montgomery algorithm is applied for modular multiplications over large integers to reduce the computational complexity. Targeting at fast elliptic curve point multiplication over prime fields a new approach in residue number system is proposed. Compared with other implementations that apply montgomery ladder for parallel elliptic curve point multiplication, the proposed method uses a residue number system with a wide dynamic range, which supports continuous multiplications and needs only one RNS montgomery multiplication to bring down the temporary results to valid range. Hardware implementation results demonstrate that the computation time for elliptic curve point multiplication over Fp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_p$$\end{document} can be greatly reduced, and it takes about 0.677 ms to compute one time of elliptic curve point multiplication over 384-bit prime curves in Xilinx XC6VSX475t device, costing an area of 41409 slices, 676 DSPs and 138 Brams.
This paper proposes two architectures for the acceleration of Number Theoretic Transforms (NTTs) using a novel montgomery-based butterfly. We first design a custom NTT hardware accelerator for Field-Programmable Gate ...
详细信息
This paper proposes two architectures for the acceleration of Number Theoretic Transforms (NTTs) using a novel montgomery-based butterfly. We first design a custom NTT hardware accelerator for Field-Programmable Gate Arrays (FPGAs). The butterfly architecture is expanded to a Modular Arithmetic Logic Unit (MALU) and for greater reuse and easier programmability a six-stage pipeline Linux-ready RISC-V core is extended with custom instructions. The performance of the proposed architectures is assessed on a Xilinx Ultrascale+ FPGA and with an Application-Specific Integrated Circuit (ASIC) on 28nm CMOS technology. In FPGA, the results for custom acceleration show reductions of 30%, 90% and 42% in the number of Lookup tables (LUTs) and registers, Block RAMs (BRAMs) and Digital Signal Processors (DSPs), while providing a speedup of 1.9 times, in comparison with the state of the art. The ASIC results show that at 1 GHz the proposed architecture is in average 45% and 52% less area and power hungry, respectively, compared to the state of the art. Furthermore, the proposed MALU, operating as an additional execution unit, increases the overall area of the extended RISC-V core by only 10%, without significant changes in the frequency of operation.
The bottleneck of all cryptosystems is the difficulty of the computational complexity of the polynomials multiplication, vectors multiplication, etc. Thus most of them use some algorithms to reduce the complexity of t...
详细信息
The Number Theoretic Transform (NTT) plays a central role for supporting high-performance polynomial multiplication on Post-Quantum Cryptography (PQC) and Fully-Homomorphic Encryption (FHE). This paper proposes a nove...
详细信息
ISBN:
(纸本)9781665427012
The Number Theoretic Transform (NTT) plays a central role for supporting high-performance polynomial multiplication on Post-Quantum Cryptography (PQC) and Fully-Homomorphic Encryption (FHE). This paper proposes a novel montgomery-based butterfly to efficiently implement NTTs on FPGAs. This proposal is supported on prime moduli suitable for FHE, which minimizes the requirements to allow the speedup of the computation of the butterfly. A search algorithm is presented to select these moduli, while flexibility is a target in all parameters making the proposed architectures well-suited for FHE and PQC schemes. We experimentally evaluate the effectiveness of the novel butterfly-core on a Xilinx Virtex-7 device. The results show reductions up to 19%, 41%, 37%, and 67% in the number of lookup tables, slices, flip-flops, and Digital Signal Processors (DSPs), respectively, in comparison to the related state of the art. By integrating the proposed butterflies in a complete NTT accelerator, a speedup of up to 1.42 is achieved, while less than half of the number of DSPs are required, when compared to the other proposals. Moreover, the integration of the proposed accelerators to design FHE-based processors is discussed.
The modular multiplication is the key module of public-key cryptosystems such as RSA (Rivest-Shamir-Adleman) and ECC (Elliptic Curve Cryptography). However, the efficiency of the modular multiplication, especially the...
详细信息
The modular multiplication is the key module of public-key cryptosystems such as RSA (Rivest-Shamir-Adleman) and ECC (Elliptic Curve Cryptography). However, the efficiency of the modular multiplication, especially the modular square, is very low. In order to reduce their operation cycles and power consumption, and improve the efficiency of the public-key cryptosystems, a dual-field efficient FIPS (Finely Integrated Product Scanning) modular multiplication algorithm is proposed. The algorithm makes a full use of the correlation of the data in the case of equal operands so as to avoid some redundant operations. The experimental results show that the operation speed of the modular square is increased by 23.8% compared to the traditional algorithm after the multiplication and addition operations are reduced about (s(2) - s)/2, and the read operations are reduced about s(2) - s, where s = n/32 for n-bit operands. In addition, since the algorithm supports the length scalable and dual-field modular multiplication, distinct applications focused on performance or cost could be satisfied by adjusting the relevant parameters.
This work presents a hardware accelerator, for the optimization of latency and area at the same time, to improve the performance of point multiplication process in Elliptic Curve Cryptography. In order to reduce the o...
详细信息
ISBN:
(纸本)9781728160443
This work presents a hardware accelerator, for the optimization of latency and area at the same time, to improve the performance of point multiplication process in Elliptic Curve Cryptography. In order to reduce the overall computation time in the proposed 2-stage pipelined architecture, a rescheduling of point addition and point doubling instructions is performed along with an efficient use of required memory locations. Furthermore, a 41-bit multiplier is also proposed. Consequently, the FPGA and ASIC implementation results have been provided. The performance comparison with state-of-the-art implementations, in terms of latency and area, proves the significance of the proposed accelerator.
Cryptography plays a major role in all the modern applications, where the Galois field (GF) arithmetic circuits are inevitable. In this paper, asynchronous GF(2(m)) and m-bits GF(p) multiplier, inverter, and exponenti...
详细信息
Cryptography plays a major role in all the modern applications, where the Galois field (GF) arithmetic circuits are inevitable. In this paper, asynchronous GF(2(m)) and m-bits GF(p) multiplier, inverter, and exponentiator are proposed, where the hardware is repeatedly reused for m iterations without synchronous registers (m = log(2)p). Also, this paper proposes an asynchronous implementation of GF(2(163)) affine coordinate based ECC scalar multiplication that includes the point addition and point doubling. Here, the inverse is calculated using Fermat's Little theorem. The entire scalar multiplication is done using only two GF(2(163)) multipliers without any hardware registers that are replaced by a completion detection logic. The same proposed logic is used in the asynchronous 128-bits AES design. The power dissipation of these proposed designs are much less than the existing designs due to the elimination of synchronous registers. Our proposed asynchronous logic is free from the glitches and metastability. The proposed asynchronous GF(2(16)) multiplier design achieves 99.6% of improvement in switching power reduction than scalable montgomery [5] based multiplier using 45 nm CMOS technology. (C) 2018 Elsevier B.V. All rights reserved.
Systolic all-one-polynomial (AOP) multipliers usually suffer from the problem of high register complexity, especially in field-programmable gate array (FPGA) platforms where the register resources are not that abundan...
详细信息
Systolic all-one-polynomial (AOP) multipliers usually suffer from the problem of high register complexity, especially in field-programmable gate array (FPGA) platforms where the register resources are not that abundant. In this paper, we have shown that the AOP-based systolic multipliers can easily achieve low register-complexity implementations and the proposed architectures can be employed as computation cores to derive efficient implementations of systolic montgomery multipliers based on trinomials. First, we propose a novel data broadcasting scheme in which the register complexity involved within existing AOP-based systolic multipliers is significantly reduced. We have found out that the modified AOP-based structure can be packed as a standard computation core. Next, we propose a novel montgomery multiplication algorithm that can fully employ the proposed AOP-based computation core. The proposed montgomery algorithm employs a novel precomputedmodular operation, and the systolic structures based on this algorithm fully inherit the advantages brought from the AOP-based core (low register complexity, low critical-path delay, and low latency) except some marginal hardware overhead brought by a precomputation unit. The proposed architectures are then implemented by Xilinx ISE 14.1 and it is shown that compared with the existing designs, the proposed designs achieve at least 61.8% and 47.6% less area-delay product and powerdelay product than the best of competing designs, respectively.
Due to globalization of IC, hardware is defenseless to new sorts of assaults, for example, counterfeiting, figuring out and IP piracy. Logic locking technique is used for the hardware security. Logic locking conceals ...
详细信息
ISBN:
(纸本)9781509049967
Due to globalization of IC, hardware is defenseless to new sorts of assaults, for example, counterfeiting, figuring out and IP piracy. Logic locking technique is used for the hardware security. Logic locking conceals the functionality and implementation of a design by inserting additional gates into the original design. The gates inserted for the locking are called key-gates. To display its correct functionality (i.e. produces correct outputs), valid key has to be provided to the locked design. Pseudo Random Number Generator (PRNG) is utilized to randomly generate the sequence of key values. The PRNG is also connected with the input to randomly generate the input values for automatic testing (BIST testing). This approach increases security level and hence applied in a cryptographic algorithm. montgomery algorithm is the cryptographic algorithm which will be tested by the logic locking technique.
暂无评论