Transformer-based models have achieved remarkable performance across various tasks, while the computational complexity presents an obstacle for deploying on resource-constrained devices. To this end, this paper propos...
Transformer-based models have achieved remarkable performance across various tasks, while the computational complexity presents an obstacle for deploying on resource-constrained devices. To this end, this paper proposes an efficient approximation framework termed NPLA for approximating non-linear operations during Transformer inference on hardware accelerators. Specifically, NPLA enables the approximation of non-linear operations using non-uniform piecewise linear functions and directly converts coefficients into LUTs for hardware implementation. Experimental results demonstrate that NPLA can reduce the hardware cost by 13.43× in LUTs and 1.98× in DSP compared to the state-of-the-art method.
Automated solutions for sizing analog circuits have gained significant interest due to the labor-intensive nature of the task in a typical design cycle, especially with technology development and circuit scaling. This...
详细信息
Despite the effort of analog circuit design automation, currently complex analog circuit design still requires extensive manual iterations, making it labor intensive and time-consuming. Recently, reinforcement learnin...
Despite the effort of analog circuit design automation, currently complex analog circuit design still requires extensive manual iterations, making it labor intensive and time-consuming. Recently, reinforcement learning (RL) algorithms have been demonstrated successfully for the analog circuit design optimization. However, a robust and highly efficient RL method to design analog circuits with complex design space has not been fully explored yet. In this work, inspired by multiagent planning theory as well as human expert design practice, we propose a multiagent based RL (MA-RL) framework to tackle this issue. Particularly, we (i) partition the complex analog circuits into several sub-blocks based on topology information and effectively reduce the complexity of design search space; (ii) leverage MA-RL for the circuit optimization, where each agent corresponds to a single sub-block, and the interactions between agents delicately mimic the best design tradeoffs between circuit sub-blocks by human experts; (iii) introduce the multiagent twin-delayed techniques to further boost training stability and accomplish higher performances. Experiments on two different analog circuit topologies and knowledge transfers between two technology nodes are demonstrated. It’s shown that MA-RL framework can achieve the best FoM for complex analog circuits design. This work shines the light for future large scale analog circuit system design automation.
Currently,digital certificate systems based on blockchain have been extensively developed and ***,most of them do not take into account the certificate *** evaluate the credibility of certificates issued by educationa...
详细信息
Currently,digital certificate systems based on blockchain have been extensively developed and ***,most of them do not take into account the certificate *** evaluate the credibility of certificates issued by educational institutions,we propose a novel blockchain-based system with credit self-adjustment(BC-CS).In BC-CS,employers can provide feedback according to the performances of their employees(i.e.,students)holding different *** on the feedback,BC-CS automatically adjusts the certificate credits by using our proposed credit self-adjustment *** verify the feasibility of our proposed system,a decentralized application prototype has been developed on an Ethereum *** results demonstrate that the proposed system can fully support multistep accreditation and automatic adjustment for certificate credit.
A 6-Gb/s half rate current mode logic (CML) transmitter has been designed in TSMC 28nm CMOS technology, which employs a 3-tap 3-bit feed forward equalizer (FFE), an analog duty cycle correction module (DCC) for half r...
A 6-Gb/s half rate current mode logic (CML) transmitter has been designed in TSMC 28nm CMOS technology, which employs a 3-tap 3-bit feed forward equalizer (FFE), an analog duty cycle correction module (DCC) for half rate clock and a 5-bit output impedance calibration module for 100-ohm differential load. Post-layout simulation indicates that the FFE could compensate for up to 4.7dB channel loss at 3GHz while maintaining 410mVpp eye height. The DCC module could corrects±30% duty cycle mismatch and max deviation of the calibrated impedance is 2.7%. Implemented in 28nm CMOS technology, the transmitter delivers 6-Gb/s data (2 15 -1 PRBS) with 4.7dB channel attenuation at 3GHz. The transmitter (excluding clock generating PLL) consumes 27.67mA from 0.9V supply and occupies a die area of 214μm * 108μm.
High-Level Synthesis (HLS) of approximate computing circuits generates circuits based on functional units such as adders and multipliers. An approximate library containing approximate functional units is firstly built...
详细信息
Perceptual image quality assessment (IQA) attracts significant attention in recent years. It is proved that both global score and an image’s visual saliency (VS) are consistent with subjective evaluation. The global ...
详细信息
RSA and ellipse curve cryptography(ECC)algorithms are widely used in authentication,data security,and access *** this paper,we analyze the basic operation of the ECC and RSA algorithms and optimize their modular multi...
详细信息
RSA and ellipse curve cryptography(ECC)algorithms are widely used in authentication,data security,and access *** this paper,we analyze the basic operation of the ECC and RSA algorithms and optimize their modular multiplication and modular inversion *** then propose a reconfigurable modular operation architecture,with a mix-memory unit and double multiply-accumulate structures,to realize our unified,asymmetric cryptosystem structure in an operational *** with 55-nm CMOS process,our design runs at 588 MHz and requires only 437801µm2 of hardware *** proposed design takes 21.92 and 23.36 mW for 2048-bit RSA modular multiplication and modular inversion respectively,as well as 16.16 and 15.88 mW to complete 512-bit ECC dual-field modular multiplication and modular inversion *** is more energy-efficient and flexible than existing single algorithm *** with existing multiple algorithm units,our proposed method shows better *** operation unit is embedded in a 64-bit RISC-V processor,realizing key generation,encryption and decryption,and digital signature functions of both RSA and *** proposed design takes 0.224 and 0.153 ms for 256-bit ECC point multiplication in G(p)and G(2^(m))respectively,as well as 0.96 ms to complete 1024-bit RSA exponentiation,meeting the demand for high energy efficiency.
In the digital circuit design stage, the analysis and prediction of aging effects can help improve circuit reliability. In this paper, we firstly propose a fast aging-aware static timing analysis prediction approach f...
详细信息
Existing learning-based inpainting methods have recently reached notable success in filling irregular holes. However, the quantity of network parameters in these methods also grows rapidly, thus making them difficult ...
详细信息
暂无评论