Large-scale parallel implementation of matrix multiply and accumulate (MAC) core poses significant energy and area constraints in analog voltage domain under reduced supply voltage. A spatial multi-bit sub-1-V time-do...
详细信息
Large-scale parallel implementation of matrix multiply and accumulate (MAC) core poses significant energy and area constraints in analog voltage domain under reduced supply voltage. A spatial multi-bit sub-1-V time-domain matrix multiplier interface is presented using multi-hit hack-gate-driven delay elements as a scalable alternative for various approximate computing applications. A single-chip solution is demonstrated for two application modes: a high-throughput digitally driven mode for acceleration and a low-energy analog front-end mode for sensing. In accelerate mode, the system achieves an aggregate throughput of 21.6 GMAC/s with 9 TOPS/W energy efficiency. In sense mode, the system exhibits an energy efficiency of 55.3 TOPS/W for classification purpose. The proposed architecture utilizes 16-parallel 6-bit input vectors to perform matrix MAC computations using time-domain signalprocessing with 3-bit resistive weights at a suh-1-V supply of 0.7 V. An integrated speculative time-to-digital converter (is employed for 6-bit time-domain quantization with an on-chip mismatch calibration scheme. The prototype is fabricated in 65-nm CMOS technology and occupies an active area of 0.04 mm(2). The system performs image recognition of handwritten digits using a machine learning scheme and demonstrates an average classification accuracy of 843% on the MNIST dataset. The resultant energy per MAC computation in the proposed spatial architecture is about 15x lower than a digital CMOS combinational logic-based parallel-tree MAC.
The performance of multiplication is crucial for multimedia applications such as 3D graphics and signalprocessingsystems, which depend on the execution of large numbers of multiplications. Previously reported algori...
详细信息
The performance of multiplication is crucial for multimedia applications such as 3D graphics and signalprocessingsystems, which depend on the execution of large numbers of multiplications. Previously reported algorithms mainly focused on rapidly reducing the partial products rows down to final sums and carries used for the final accumulation. These techniques mostly rely on circuit optimization and minimization of the critical paths. In this paper, an algorithm to achieve fast multiplication in two's complement representation is presented. Rather than focusing on reducing the partial products rows down to final sums and carries, our approach strives to generate fewer partial products rows. In turn, this influences the speed of the multiplication, even before applying partial products reduction techniques. Fewer partial products rows are produced, thereby lowering the overall operation time. In addition to the speed improvement, our algorithm results in a true diamond-shape for the partial product tree, which is more efficient in terms of implementation. The synthesis results of our multiplication algorithm using the Artisan TSMC 0.13um 1.2-Volt standard-cell library show 13 percent improvement in speed and 14 percent improvement in power savings for 8-bit x 8-bit multiplications (10 percent and 3 percent, respectively, for 16-bit x 16-bit multiplications) when compared to conventional multiplication algorithms.
The future of positron emission tomography (PET) is systems with ultra-precise coincidence time resolution (CTR) to advance time-of-flight PET (TOF-PET) performance. Current state-of-the-art commercial PET systems hav...
ISBN:
(纸本)9781509016426
The future of positron emission tomography (PET) is systems with ultra-precise coincidence time resolution (CTR) to advance time-of-flight PET (TOF-PET) performance. Current state-of-the-art commercial PET systems have 350-800 ps fullwidth-at-half-maximum (FWHM) timing performance, constraining annihilation events to lie somewhere within a 5–12 cm region along system detector response lines (LORs). This constraint is applied during the image reconstruction process to enhance image SNR for improved lesion detectability, increased accuracy and precision of lesion uptake measurements, less sensitivity to errors in data correction techniques (normalization, scatter, and attenuation corrections), lower injected dose, or shorter scan time. The effect of these improvements on image quality and accuracy scales with system CTR performance, and a long-standing milestone for the TOFPET community is to drive system CTR towards 100 ps FWHM (1.5 cm localization along LORs). At this level of performance, a factor of five improvement in SNR can be realized compared to non-TOF imaging, with a transformational impact on quantitative PET imaging in many count starved and contrast-limited scenarios. Traditional PET detector designs are not able to achieve this level of CTR performance, and thus new detector concepts and signalprocessing methods should be explored to advance system CTR
In this paper, a high performance of fuzzy controller for DC servo motor is developed. The system design and implementation procedures of DC servo motor drive using digital signalprocessing chip TMS320C14 are describ...
详细信息
In this paper, a high performance of fuzzy controller for DC servo motor is developed. The system design and implementation procedures of DC servo motor drive using digital signalprocessing chip TMS320C14 are described. Some experimental results of the proposed controller are shown to confirm the satisfied performance of the controller.
Currently, chaotic signal generators is of importance in cryptographic applications and chaotic communication systems. One of the significant field of the chaotic signal oscillators are random number generators. In th...
详细信息
ISBN:
(纸本)9781479948741
Currently, chaotic signal generators is of importance in cryptographic applications and chaotic communication systems. One of the significant field of the chaotic signal oscillators are random number generators. In this paper, an FPGA-based new true random number generator system using discrete-time chaotic signal generator is presented. The system designed incorporates the Sprott 94 G chaotic system based on an FPGA deployed with ieee 754 standard. In order to produce random bits a quantification process has been performed on the results produced by the chaotic oscillator unit. Furthermore, the XOR method has been determined as restoring function to obtain a true random bit generator. The maximum operating frequency of FPGA-based true random number generator has been able to reach up to 399,383 MHz. The 20,000-bit sequence has been generated by the designed system and they have been saved to the test result file. They have been tested using NIST test suite and FIPS-140-1 standards and successful results have been obtained. It is concluded that the FPGA-based system is able to be used in cryptologic applications.
A methodology for the design and analysis of the best fourth-order topology for sigma-delta modulators (SDM) is described. In the determination of stable topology for analysis, theoretical analysis, combined with DC a...
详细信息
A methodology for the design and analysis of the best fourth-order topology for sigma-delta modulators (SDM) is described. In the determination of stable topology for analysis, theoretical analysis, combined with DC analysis was used to determine the ranges of loop coefficients which stabilize the system, while a numerical analysis was employed to analyze the ranges of loop coefficients. The analysis provided stable regions in the frequency domain from where a set of loop coefficients for VLSI implementation was selected. With this set of coefficients, the results of simulated behavior of fourth-order leapfrog topology indicated the potential application of fourth-order topology to ultra-high resolution signalprocessing system.
The proceedings contain 35 papers. The topics discussed include: optimization of parametric yield;wafer-scale massively parallel computing modules for fault-tolerant signal and data processing;defect tolerance and yie...
ISBN:
(纸本)0818624574
The proceedings contain 35 papers. The topics discussed include: optimization of parametric yield;wafer-scale massively parallel computing modules for fault-tolerant signal and data processing;defect tolerance and yield for a WSI rapid prototyping architecture;current-mode techniques for analog VLSI: technology and defect tolerance issues;circuit design for a large area high-performance crossbar switch;improved yield models for fault-tolerant random-access memory chips;knowledge-based electrical monitor approach using very large array yield structures to delineate defects during process development and production yield improvement;a model for enhanced manufacturability of defect tolerant integrated circuits;and neural networks on silicon: the mapping of hardware faults onto behavioral errors.
When the maximum delay spread exceeds the guard interval, a time domain equalization (TEQ) technique for 54Mbps ieee 802.11a orthogonal frequency division multiplexing (OFDM) system is proposed in [1] to shorten the e...
详细信息
When the maximum delay spread exceeds the guard interval, a time domain equalization (TEQ) technique for 54Mbps ieee 802.11a orthogonal frequency division multiplexing (OFDM) system is proposed in [1] to shorten the effective channel impulse response. The proposed algorithm has a reduced computational complexity for practical use. In this paper the detailed design and implementation of the algorithm in field programmable gate array (FPGA) are presented. In solving the linear equation Aw=B for the optimum TEQ coefficients, the matrix A is proved to be Hermitian and positive definite. The regularities between the elements of A are exploited to reduce hardware complexity. The LDLT and LU decompositions are combined in hardware design to find the TEQ coefficients in less than 4μs. To compensate the effective channel impulse response, a radix-4 pipeline fast Fourier transform (FFT) is implemented to perform zero forcing equalization. The hardware design information is provided and the simulation results are compared with the theoretical values. The results verify the chips function properly at 54Mbps.
Space-Time Blocking Coding (STBC) has become very popular as an efficient diversity creation technique. With extra transmission redundancies introduced, the receiver performance can be significantly improved via STBC....
详细信息
Space-Time Blocking Coding (STBC) has become very popular as an efficient diversity creation technique. With extra transmission redundancies introduced, the receiver performance can be significantly improved via STBC. In this paper, we study the signal recovery problem for a physical Multiple-Input-Multiple-Output (MIMO) channel via an inverse FIR filterbank together with STBC technique. It can be formulated within a multirate polyphase framework to reach an equivalent virtual MIMO system. It is shown that such a system enables equalization of ill conditioned MIMO channels and can offer performance superior to that with direct equalizations. Based on Generalized Bezout Identity theorems, the recoverability conditions are established. We also address the design problem for optimal noise resilience, which is critical from practical perspective. Furthermore, STBC can flexibly reduce the transmission rate to enhance the equalization SNR. The roles of different STBC and equalizer parameters are analyzed for the optimal tradeoff among transmission rate, diversity gain, and implementation complexity.
The proceedings contain 35 papers. The topics discussed include: monitoring workers through wearable transceivers for improving work safety;indoor location system based on ZigBee devices and metric description graphs;...
ISBN:
(纸本)9781457714016
The proceedings contain 35 papers. The topics discussed include: monitoring workers through wearable transceivers for improving work safety;indoor location system based on ZigBee devices and metric description graphs;implementation of an intelligent sensor for measurement and prediction of solar radiation and atmospheric temperature;design of a sensor network based security system;comparison and improvement of Dempster-Shafer models;an add-on solution to take measures automatically from critical care urine meters;enhancing time-frequency parameters estimation for Doppler ultrasound blood-flow signals (MR);activity monitoring and emergency warning with location information of the user;towards a more analytical training of neural networks and neuro-fuzzy systems;exploiting the functional training approach in radial basis function networks;and genetic algorithm for searching a Doppler resilient multilevel complementary waveform.
暂无评论