In recent years, neural network-based Wake Word Spotting achieves good performance on clean audio samples but struggles in noisy environments. Audio-Visual Wake Word Spotting (AVWWS) receives lots of attention because...
In recent years, neural network-based Wake Word Spotting achieves good performance on clean audio samples but struggles in noisy environments. Audio-Visual Wake Word Spotting (AVWWS) receives lots of attention because visual lip movement information is not affected by complex acoustic scenes. Previous works usually use simple addition or concatenation for multi-modal fusion. The inter-modal correlation remains relatively under-explored. In this paper, we propose a novel module called Frame-Level Cross-Modal Attention (FLCMA) to improve the performance of AVWWS systems. This module can help model multi-modal information at the frame-level through synchronous lip movements and speech signals. We train the end-to-end FLCMA based Audio-Visual Conformer and further improve the performance by fine-tuning pre-trained uni-modal models for the AVWWS task. The proposed system achieves a new state-of-the-art result (4.57% WWS score) on the far-field MISP dataset.
In this work, we have considered the Re-configurable intelligent surface (RIS) aided beyong 5G communication systems. The system considered the two scenarios. First, we have taken the scenario where a RIS aided path a...
详细信息
The global energy crisis has contributed to the development of integrated energy systems (IES). IES enables the coupling, conversion, and utilisation of multiple energy, such as electricity, heat and cooling. As renew...
详细信息
In this paper, an FIR filter using Vedic Multiplier and Carry Look Ahead adder (VMCLA) is proposed for Electrocardiogram (ECG) denoising. Vedic Multiplier based on Urdhva Tiryaghbhyam technique is designed because of ...
详细信息
ISBN:
(数字)9781665488235
ISBN:
(纸本)9781665488235
In this paper, an FIR filter using Vedic Multiplier and Carry Look Ahead adder (VMCLA) is proposed for Electrocardiogram (ECG) denoising. Vedic Multiplier based on Urdhva Tiryaghbhyam technique is designed because of its faster and better computational ability. Further, the delay in addition of partial products is also reduced by using Carry Look Ahead (CLA) Adder. ECG signalprocessing and validation of performance parameters are done in MATLAB. The proposed FIR filter design algorithm is synthesized and simulated using Vivado design suite 2021.2 and implemented on Kintex-7 FPGA board. The results revealed a high performance and area efficient compared to a well-known prior art filters.
Adaptive Cruise Control System (ACC) is an essential ADAS system for the autonomous vehicle for preventing all kinds of collision in the complex traffic and road situations. Ramp merging has been considered as one of ...
详细信息
As the system of intelligent sensing and video surveillance continues to improve, its application in different fields has also emerged. In this study, the smart sports video tracking and safety solutions integrating t...
详细信息
Reliable identification and authentication methods are a growing concern in various domains. Despite facial recognition, fingerprint and iris recognition being common options in the field of biometrics, their inherent...
详细信息
Recognition system of human actions is a popular research field in healthcare which is essential in identifying abnormal patient activities to estimate their psychological state. Epileptic seizures are a common neurol...
详细信息
intelligent power plant is the future development direction of power generation enterprises, and it is the only way for power generation enterprises to realize the deep integration of information and industrialization...
详细信息
We present a novel real-time visual odometry framework for a stereo setup of a depth and high-resolution event camera. Our framework balances accuracy and robustness against computational efficiency towards strong per...
详细信息
ISBN:
(纸本)9781728196817
We present a novel real-time visual odometry framework for a stereo setup of a depth and high-resolution event camera. Our framework balances accuracy and robustness against computational efficiency towards strong performance in challenging scenarios. We extend conventional edge-based semi-dense visual odometry towards time-surface maps obtained from event streams. Semi-dense depth maps are generated by warping the corresponding depth values of the extrinsically calibrated depth camera. The tracking module updates the camera pose through efficient, geometric semi-dense 3D-2D edge alignment. Our approach is validated on both public and self-collected datasets captured under various conditions. We show that the proposed method performs comparable to state-of-the-art RGB-D camera-based alternatives in regular conditions, and eventually outperforms in challenging conditions such as high dynamics or low illumination.
暂无评论