咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >FPGA Realization of Low Regist... 收藏

FPGA Realization of Low Register Systolic All-One-Polynomial Multipliers Over <i>GF</i>(2<i><SUP>m</SUP></i>) and Their Applications in Trinomial Multipliers

作     者:Chen, Pingxiuqi Basha, Shaik Nazeem Mozaffari-Kermani, Mehran Azarderakhsh, Reza Xie, Jiafeng 

作者机构:Wright State Univ Dept Elect Engn Dayton OH 45435 USA Rochester Inst Technol Dept Microelect & Elect Engn Rochester NY 14623 USA Rochester Inst Technol Dept Comp Engn Rochester NY 14623 USA 

出 版 物:《IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS》 (IEEE Trans Very Large Scale Integr VLSI Syst)

年 卷 期:2017年第25卷第2期

页      面:725-734页

核心收录:

学科分类:0808[工学-电气工程] 08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

主  题:All one polynomial (AOP) finite field multiplication irreducible trinomials low register complexity Montgomery algorithm systolic structure 

摘      要:Systolic all-one-polynomial (AOP) multipliers usually suffer from the problem of high register complexity, especially in field-programmable gate array (FPGA) platforms where the register resources are not that abundant. In this paper, we have shown that the AOP-based systolic multipliers can easily achieve low register-complexity implementations and the proposed architectures can be employed as computation cores to derive efficient implementations of systolic Montgomery multipliers based on trinomials. First, we propose a novel data broadcasting scheme in which the register complexity involved within existing AOP-based systolic multipliers is significantly reduced. We have found out that the modified AOP-based structure can be packed as a standard computation core. Next, we propose a novel Montgomery multiplication algorithm that can fully employ the proposed AOP-based computation core. The proposed Montgomery algorithm employs a novel precomputedmodular operation, and the systolic structures based on this algorithm fully inherit the advantages brought from the AOP-based core (low register complexity, low critical-path delay, and low latency) except some marginal hardware overhead brought by a precomputation unit. The proposed architectures are then implemented by Xilinx ISE 14.1 and it is shown that compared with the existing designs, the proposed designs achieve at least 61.8% and 47.6% less area-delay product and powerdelay product than the best of competing designs, respectively.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分