This paper tackles the front-back disambiguity problem in speaker localization when the audio signals are captured by a symmetric microphone array. To this end, a deep neural network is proposed with an attention-base...
详细信息
This paper tackles the front-back disambiguity problem in speaker localization when the audio signals are captured by a symmetric microphone array. To this end, a deep neural network is proposed with an attention-based mechanism designed to assign different weights to features obtained from individual microphones. For support, a real dataset with synchronized multichannel audio signals captured by a large linear microphonearray is introduced, along with manual annotations. The experimental results demonstrate the effectiveness of the proposed method over the other approaches. In particular, more than 50% reduction in Equal Error Rate (EER) is achieved when comparing with the single-channel case. The designed multi-channel self-attention mechanism also brings further improvements. The dataset and source code will be released.
In this paper we propose a robust and efficient method to utilize the spatial information provided by a distributed microphonearray for acoustic scene analysis. In our approach, similarly to the cepstrum, which is wi...
详细信息
ISBN:
(纸本)9780992862633
In this paper we propose a robust and efficient method to utilize the spatial information provided by a distributed microphonearray for acoustic scene analysis. In our approach, similarly to the cepstrum, which is widely used as a spectral feature, the logarithm of the amplitude in multichannel observation is converted to a feature vector by a linear orthogonal transformation. Then, the spatial information of the acoustic scene is represented in the spatial feature space. This approach does not require the positions of the microphones and is not sensitive to the synchronization mismatch of channels, both of which make the method suitable for use with a distributed microphonearray. Experimental results using real-life environmental sounds show the validity of our approach even when a smaller feature dimension than the original one is used.
In this paper we propose a robust and efficient method to utilize the spatial information provided by a distributed microphonearray for acoustic scene analysis. In our approach, similarly to the cepstrum, which is wi...
详细信息
ISBN:
(纸本)9781479988518
In this paper we propose a robust and efficient method to utilize the spatial information provided by a distributed microphonearray for acoustic scene analysis. In our approach, similarly to the cepstrum, which is widely used as a spectral feature, the logarithm of the amplitude in multichannel observation is converted to a feature vector by a linear orthogonal transformation. Then, the spatial information of the acoustic scene is represented in the spatial feature space. This approach does not require the positions of the microphones and is not sensitive to the synchronization mismatch of channels, both of which make the method suitable for use with a distributed microphonearray. Experimental results using real-life environmental sounds show the validity of our approach even when a smaller feature dimension than the original one is used.
暂无评论