Sustained energy efficiency improvements have been instrumental for the vertiginous evolution of electronic systems with computing, sensing, communication and storage capabilities. Energy efficiency improvements are i...
详细信息
Sustained energy efficiency improvements have been instrumental for the vertiginous evolution of electronic systems with computing, sensing, communication and storage capabilities. Energy efficiency improvements are indeed crucial for continued increase in the performance under a limited power budget, reduced operating cost, as well as for untethering traditionally wired systems. This is indeed true for high-performance systems subject to heat removal limitations (e.g., server blades), as well as for operational cost considerations when the cost of electricity is a major fraction of the total cost, as in the case of datacenters [1] , or the more recent crypto-currency mining endeavors [2] . Energy reductions are also critical in portable electronics, due to the limited thermal budget and battery energy availability. Similarly, energy reductions are essential in miniaturized energy-autonomous systems such as sensor nodes, hearables, wearables and others, due to their tightly constrained energy source [3] . Overall, energy efficiency improvements have historically permitted the continuous size down-scaling and lifetime extension of electronic systems (see, [4] ).
This paper introduces a novel method for designing approximate circuits by fabricating and exploiting false timing paths, i.e., critical paths that cannot be logically activated. This allows to strongly relax timing c...
详细信息
This paper introduces a novel method for designing approximate circuits by fabricating and exploiting false timing paths, i.e., critical paths that cannot be logically activated. This allows to strongly relax timing constraints while guaranteeing minimal and controlled behavioral change. This technique is applied to an approximate adder architecture, called the Carry Cut-Back Adder (CCBA), in which high-significance stages can cut the carry propagation chain at lower-significance positions. This lightweight approach prevents the logic activation of the carry chain, improving performance and energy efficiency while guaranteeing low worst-case errors. A design methodology is presented along with implementation, error optimization, and design-space minimization. The CCBA is proven capable of extremely high accuracy while displaying significant circuit savings. For a worst case precision of 99.999%, energy savings up to 36% are demonstrated compared with exact adders. Finally, an industry-oriented comparison of 32-bit approximate and truncated adders is carried out for mean and worst-case relative errors. The CCBA outperforms both state-of-the-art and truncated adders for high-accuracy and low-power circuits, confirming the interest of the proposed concept to help building highly-efficient approximate or precision-scalable hardware accelerators.
This paper presents OCEAN: an artificial neural network processor designed for accelerating gated-recurrentunit (GRU) inference and on-chip incremental learning for sequential modeling. Implemented in 65-nm CMOS with ...
详细信息
This paper presents OCEAN: an artificial neural network processor designed for accelerating gated-recurrentunit (GRU) inference and on-chip incremental learning for sequential modeling. Implemented in 65-nm CMOS with silicon area of 2.9 x 3.5 mm(2), the OCEAN processor features a 32-bit reduced instruction set computing core, 64-KB on-chip SRAM, and eight 16-bit four-cell GRU accelerators for inference and gradient computation. Each GRU accelerator is optimized and enhanced for efficient gradient computation. The processor is measured to consume 155 mW at the peak clock rate of 400 MHz and the supply of 1.2 V or 6.6 mW at 20 MHz/0.8 V. Both inference and on-chip incremental learning are accomplished on well-known AI tasks such as handwritten digit recognition, semantic natural language processing, and biomedical waveform based seizure detection.
This paper presents a learning algorithm for a vector-matrix multiplier (VMM) + k-winner-take-all (WTA) classifier one-layer architecture on a large-scale field programmable analog array (FPAA). The technique enables ...
详细信息
This paper presents a learning algorithm for a vector-matrix multiplier (VMM) + k-winner-take-all (WTA) classifier one-layer architecture on a large-scale field programmable analog array (FPAA). The technique enables opportunities for embedded, ultra-low power machine learning, techniques typically considered for large servers. To develop this training algorithm, this paper starts by understanding fundamental equivalent transformations for the VMM + WTA classifier networks. A VMM+WTA structure can exactly compute a self-organizing map (SOM) or vector quantization (VQ) operation, in addition to other transformations. SOM, VQ, and Gaussian mixture models learning concepts are utilized for the training algorithm of this single one-layer network. An on-chip clustering step determines the initial weight set for ideal target and background values. Null symbols are important for the algorithm and are set from midpoints of the target values. The results are shown both as numerical simulation of the VMM+WTA learning network, illustrating some numerical differential equation simulation limitations for this problem, as well as experimental measurements implemented on an system on chip FPAA device.
This paper summarizes our latest results of integrated all-to-all optical interconnect systems using compact, low-loss silicon nitride (SiN) arrayed waveguide grating router (AWGR) through AIM photonics' multiple-...
详细信息
This paper summarizes our latest results of integrated all-to-all optical interconnect systems using compact, low-loss silicon nitride (SiN) arrayed waveguide grating router (AWGR) through AIM photonics' multiple-project-wafer services. In particular, we have designed, taped out, and initially characterized a chip-scale silicon photonic low-latency interconnect optical network switch (Si-LIONS) system with an 8 x 8 200 GHz spacing cyclic SiN AWGR, 64 microdisk modulators, and 64 ON-chip germanium photodector (PD). The 8 x 8 SiN AWGR in design has a measured insertion loss of 1.8 dB and a crosstalk of -13 dB, with a footprint of 1.3 mm x 0.9 mm. We measured an error-free performance of the microdisk modulator at 10 Gb/s upon 1Vpp voltage swing. We demonstrated wavelength routing with error-free data transmission using the ON-chip modulator, SiN AWGR, and an external PD. We have designed and taped out the optical interposer version of the all-to-all system using SiN waveguides and low-loss chip-to-interposer couplers. Finally, we illustrate our preliminary designs and results of 16x16 and 32x32 SiN AWGRs, and discuss the possibility of scaling beyond 1024 x 1024 all-to-all interconnections with reduced number of wavelengths (e.g., 64) using the Thin-CLOS architecture.
"Presents a listing of the editorial board, board of governors, current staff, committee members, and/or society editors for this issue of the publication."
"Presents a listing of the editorial board, board of governors, current staff, committee members, and/or society editors for this issue of the publication."
Presents a listing of the editorial board, board of governors, current staff, committee members, and/or society editors for this issue of the publication.
Presents a listing of the editorial board, board of governors, current staff, committee members, and/or society editors for this issue of the publication.
emerging millimeter-wave (mmW) wireless systems require beamforming and multiple-input multiple-output (MIMO) approaches in order to mitigate path loss, obstructions, and attenuation of the communication channel. Shar...
详细信息
emerging millimeter-wave (mmW) wireless systems require beamforming and multiple-input multiple-output (MIMO) approaches in order to mitigate path loss, obstructions, and attenuation of the communication channel. Sharp mmW beams are essential for this purpose and must support baseband bandwidths of at least 1 GHz to facilitate higher system capacity. This paper explores a baseband multi-beamforming method based on the spatial Fourier transform. Approximate computing techniques are used to propose a low-complexity fast algorithm with sparse factorizations that neatly map to integer W/L ratios in CMOS current mirrors. The resulting approximate fast Fourier transform (FFT) can thus be efficiently realized using CMOS analog integrated circuits to generate multiple, parallel mmW beams in both transmit and receive modes. The paper proposes both 8- and 16-point approximate-FFT algorithms together with circuit theory and design information for 65-nm CMOS implementations. Post-layout simulations of the 8-point circuit in Cadence Spectre provide well-defined mmW beam shapes, a baseband bandwidth of 2.7 GHz, a power consumption of 70 mW, and a dynamic range >42.2 dB. Preliminary experimental results confirm the basic functionality of the 8-beam circuit. Schematic-level analysis of the 16-beam I/Q version show worst-case and average side lobe levels of -10.2 dB and -12.2 dB at 1 GHz bandwidth, and -9.1 dB and -11.3 dB at 1.5 GHz bandwidth. The proposed multibeam architectures have the potential to reduce circuit area and power requirements while meeting the bandwidth requirements of emerging 5G baseband systems.
Presents a listing of the editorial board, board of governors, current staff, committee members, and/or society editors for this issue of the publication.
Presents a listing of the editorial board, board of governors, current staff, committee members, and/or society editors for this issue of the publication.
暂无评论