A novel hardware algorithm, architecture and an optimization technique for residue multipliers are introduced in this paper. The proposed architecture exploits certain properties of the bit products to achieve low-com...
详细信息
A novel hardware algorithm, architecture and an optimization technique for residue multipliers are introduced in this paper. The proposed architecture exploits certain properties of the bit products to achieve low-complexity implementation via a set of introduced theorems that allow the definition of a graph based design methodology. In addition the proposed multiplier employs the Canonic Signed Digit (CSD) encoding to minimize the number of bit products required to be processed. Performance data reveal that the introduced architecture achieves area x time complexity reduction of up to 55%, when compared to the most efficient previously reported design.
Software defined radio (SDR) is an emerging paradigm for wireless terminals, in which the physical layer of communication protocols is implemented in software rather than by ASICs. Many of the current and next generat...
详细信息
Software defined radio (SDR) is an emerging paradigm for wireless terminals, in which the physical layer of communication protocols is implemented in software rather than by ASICs. Many of the current and next generation wireless protocols include turbo coding because of its superior performance. However, turbo decoding is computationally intensive, and its low power implementations have typically been in ASICs. This paper presents a case study of algorithm-architecture co-design of turbo decoder for SDR. We present a programmable DSP architecture for SDR that includes a set of architectural features to accelerate turbo decoder computations. We then present a parallel window scheduling for MAX-Log-MAP component decoder that matches well with the DSP architecture. Finally, we present a software implementation of turbo decoder for W-CDMA on the DSP architecture and show that it achieves 2 Mbps decoding throughput
Due to the defects of the long time accumulation method often used now,a new parallel PSO tracking algorithm based on Multi-core Digital signalprocessing(DSP) parallel system was proposed which greatly reduced the ca...
详细信息
ISBN:
(纸本)9781467383196
Due to the defects of the long time accumulation method often used now,a new parallel PSO tracking algorithm based on Multi-core Digital signalprocessing(DSP) parallel system was proposed which greatly reduced the calculation cost and complexity,compensated the target accurately and enhanced the real-time processing.A total of three flights experiment was carried *** time both the conventional10 ms accumulation method and parallel PSO tracking method were adopted to track the target at the same *** experiment results showed that the parallel PSO tracking algorithm was tracking the target with less computable complexity and more effective real-time performance.
Application of partial-response (PR) signaling and maximum-likelihood sequence detection (MLSD) to digital magnetic recording has been shown in theory and practice to further increase the storage densities and reliabi...
详细信息
Application of partial-response (PR) signaling and maximum-likelihood sequence detection (MLSD) to digital magnetic recording has been shown in theory and practice to further increase the storage densities and reliability that systems using run-length limited (RLL) coding and peak detection (PD)-still the prevalent signalprocessing techniques today-can currently achieve. In this paper, the realization of a digital recording system using PR class-IV signaling with MLSD (PRML) is described. To perform MLSD at the high data rates encountered in recording systems, a simple implementation of the Viterbi detector is developed based on a difference-metric algorithm. We present decision-directed schemes for gain control and timing recovery, for tracking variations of the gain and timing phase during data readback, and for fast initial adjustment from a known preamble. The dynamic behavior of the control algorithms is studied by computer simulations. Coding is used to facilitate timing recovery and gain control, to limit the path memory length of the Viterbi detector, and to allow fast and reliable startup of the receiver. The design and properties of rate-8/9 constrained codes are examined. Finally, the problem of equalization is addressed, and analog and combined analog/digital filter implementations are developed. A simple adaptive equalizer capable of compensating variations of the recording channel characteristics with track radius and/or head-to-medium distance is described.
Convolutional neural networks (CNNs) play a key role in deep learning applications. However, the large storage overheads and the substantial computational cost of CNNs are problematic in hardware accelerators. Computi...
详细信息
Convolutional neural networks (CNNs) play a key role in deep learning applications. However, the large storage overheads and the substantial computational cost of CNNs are problematic in hardware accelerators. Computing-in-memory (CIM) architecture has demonstrated great potential to effectively compute large-scale matrix-vector multiplication. However, the intensive multiply and accumulation (MAC) operations executed on CIM macros remain bottlenecks for further improvement of energy efficiency and throughput. To reduce computational costs, model compression is a widely studied method to shrink the model size. For implementation in a static random access memory (SRAM) CIM-based accelerator, the model compression algorithm must consider the hardware limitations of CIM macros. In this study, a software and hardware co-design approach is proposed to design MARS, a SRAM-based CIM (SRAM CIM)-based CNN accelerator that can utilize multiple SRAM CIM macros as processing units and support a sparse CNN, and an SRAM CIM-aware model compression algorithm that considers a CIM architecture to reduce the number of network parameters. With the proposed hardware software co-designed method, MARS can reach over 700 and 400 FPS for CIFAR-10 and CIFAR-100, respectively. In addition, MARS achieves 52.3 and 88.2 TOPs/W in VGG16 and ResNet18, respectively.
In hardware implementation of any signalprocessing algorithm, computational time and hardware resources are crucial issues. This paper presents design and implementation of a new architecture of wavelet filter for po...
详细信息
ISBN:
(纸本)9781509021185
In hardware implementation of any signalprocessing algorithm, computational time and hardware resources are crucial issues. This paper presents design and implementation of a new architecture of wavelet filter for power system harmonics estimation using discrete wavelet packet transform (DWPT). Usually, DWPT provides coefficients as the output, however, the proposed architecture also includes provision for providing rms values directly. The proposed method reduces computational requirements and save memory resources. Xilinx system generator, a higher abstraction level tool, has been used to simulate and implement the proposed scheme on the Xilinx Artix-7 FPGA AC-701 board. Performance of the proposed architecture has been validated and compared through hardware co-simulation with variety of synthetic and experimental signals.
Real-Time biosignal classification in power-constrained embedded applications is a key step in designing portable e-healtb devices requiring hardware integration along with concurrent signalprocessing. This paper pre...
详细信息
Non-verbal behaviors have a key role in making a Virtual Character appear life-like. We describe an extensible system for the specification, control and real-time generation of facial expressions and gestures. The sys...
详细信息
ISBN:
(纸本)0780385780
Non-verbal behaviors have a key role in making a Virtual Character appear life-like. We describe an extensible system for the specification, control and real-time generation of facial expressions and gestures. The system approximates in a MPEG-4 based Virtual Character the wide expressive range, dynamism (an expression's meaning significantly depends on its temporal evolution) and variability (an emotion is never expressed exactly in the same way by different people, and even by the same person at different times), typical of human non-verbal behavior. The MPEG-4 standard only allows high-level control of 6 basic emotions, and does not explicitly support the description of an expression temporal evolution. Our approach has been that of creating a hierarchical model of expressiveness;expressions are defined in term of parameterized functions controlling low-level animation parameters trajectories (by means of an XML-based Expression Definition Markup Language). The real-time generation of those expressions is performed by an Expression Synthesis Engine. The system allows to effectively modulate expressivity both at design-time (the developer tweaks the parameters to give the character a given expressive style), and at run-time (the engine automatically changes the way in which an expression is performed each time), producing controllable, but non-deterministic, behavior patterns, a key factor for enhancing believability.
Algorithms for digital communication systems can be represented via a set of specialized signalprocessing (SP) blocks. Hence, in the implementation of these algorithms, the interconnection among the processing blocks...
详细信息
This paper explains the design of a real time fiber optic intrusion sensing system. The said system operates passively on the field, unlike most of its counterparts, without any performance compromise. Optical signal ...
详细信息
ISBN:
(纸本)9781467391276
This paper explains the design of a real time fiber optic intrusion sensing system. The said system operates passively on the field, unlike most of its counterparts, without any performance compromise. Optical signalprocessing and electronics is discussed. Simulation results are included in order to support the design. signal estimation is accomplished through novel stochastic estimation technique. The POD of the system is found to be greater than 90% through extensive testing.
暂无评论