The conventional post-filtering methods in the multi-channel noise reduction application only rely on multichannel input signals while neglecting the noise reduction capability of the microphone array beamformer, whic...
详细信息
The conventional post-filtering methods in the multi-channel noise reduction application only rely on multichannel input signals while neglecting the noise reduction capability of the microphone array beamformer, which results in overestimation of the noise power spectral density (PSD) and consequently suboptimal filters in the minimum mean-square-error (MMSE) sense. This paper proposes a novel microphone array post-filter based on accurate PSD estimation, the beamformer output in the microphone array is used to estimate the noise PSD, and a two-step noise reduction method is also employed to obtain accurate post-filter gain function. The error analysis is also given to highlight the advantage of the proposed algorithm over the conventional Zelinski and McCowan post-filters. The performance advantages of the proposed post-filter are demonstrated in terms of segmental SNR (SegSNR), short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ), and deep noise suppression mean opinion score (DNSMOS).
With the progress of artificial intelligence (AI) technology, home appliances are becoming more advanced to enhance our quality of life. Many smart devices support speech interfaces, including voice commands and user ...
详细信息
With the progress of artificial intelligence (AI) technology, home appliances are becoming more advanced to enhance our quality of life. Many smart devices support speech interfaces, including voice commands and user location tracking. However, robotic vacuum cleaners generate strong ego-noise that distorts microphone signals, making it difficult to estimate the user's location. To solve this problem, we propose a real-time sound source localization (SSL) system for a robotic vacuum cleaner equipped with a microphone array. We design a system that consists of speech enhancement, voice activity detection (VAD), and SSL modules. The speech enhancement module includes TRU-Net-Light, which has lower computation and similar speech enhancement performance to tiny recurrent U-net (TRU-Net). The TRU-Net-Light reduces the number of channels to reduce the model size and applies a frequency-axis multihead self-attention to boost representational capacity. The finite state machine-based VAD is designed to detect voice active periods using the output of a speech enhancement module. Furthermore, we present a mask-weighted difference correlation vector and the singular value decomposition (SVD) with smoother coherence transform (DSVD-SCOT) that achieves robust localization performance in severely noisy environments. In the experimented robotic vacuum cleaner, the localization accuracy of the SSL system was 97.9% and 84.0% for signal-to-noise ratios (SNRs) of -3 and -8 dB, respectively. The proposed system was run in real-time, with a real-time factor (RTF) of 0.378, on a single Kryo 585 Silver core in the RB5 platform. A demo of the proposed system is available at https://***/3d3Cr-cs9aY.
Advances in passive acoustic monitoring (PAM) have highlighted the importance of recording devices and audio recognition techniques in ecosystem monitoring. This study introduces Chirparray, a cost-effective and easil...
详细信息
Advances in passive acoustic monitoring (PAM) have highlighted the importance of recording devices and audio recognition techniques in ecosystem monitoring. This study introduces Chirparray, a cost-effective and easily assembled microphone array for long-term ecoacoustic monitoring of outdoor ecosystems. The Chirparray system features a four-channel microphone array that estimates sound source directions, aids in identifying individual animals and provides a detailed behavioural analysis. Unlike previous microphone arrays, Chirparray is low-cost, low-power and waterproof, making it ideal for extended-field monitoring. It is a fully open source and is constructed from readily available materials, ensuring broad accessibility for applications ranging from local citizen projects to large-scale landscape recordings. This study details the power, storage consumption and localization performance of Chirparray. Current measurements show that a small solar-power set-up can continuously operate the system. Localization tests using loudspeakers have yielded promising results. Chirparray is a compact, energy-efficient microphone array designed for long-term outdoor recording, offering considerable advantages in ecoacoustic monitoring. Its exceptionally low power consumption allows for efficient and flexible deployment, making it an ideal solution for extended-field monitoring and large-scale landscape recordings.
Multi-channel acoustic signal processing is a well-established and powerful tool to exploit the spatial diversity between a target signal and non-target or noise sources for signal enhancement. However, the textbook s...
详细信息
Growing interest in microphone array technology has been observed in the automotive industry and in this work, specifically, for Active Noise Control (ANC) systems. However, the human presence always limits the usage ...
详细信息
Growing interest in microphone array technology has been observed in the automotive industry and in this work, specifically, for Active Noise Control (ANC) systems. However, the human presence always limits the usage of microphone arrays in driving conditions at the driver's seat. This is often the most important position of the car cabin;a wearable microphone array is particularly interesting. In this paper, a wearable helmet microphone array is presented featuring 32 microphones arranged over the surface of a helmet, which also integrates a specially designed Analog-to-Digital (A/D) converter, delivering digital signals over the Automotive Audio Bus (A(2)B). Digital signals are collected using a control unit located in the passenger compartment. The control unit can either deliver digital signals to a personal computer or analog signals to an external acquisition system, by means of Digital-to-Analog (D/A) converters. A prototype was built and acoustically characterized to calculate the beamforming filter matrix required to convert the recordings (pressure signals) into Ambisonics signals (a spatial audio format). The proposed solution was compared to the reference spherical microphone array of the last decade, demonstrating better performance in sound source localization at low frequencies, where ANC systems are mostly effective.
This paper introduces a novel technique for estimating the signal power spectral density to be used in the transfer function of a microphone array post-filter. The technique is a generalization of the existing Zelinsk...
详细信息
This paper introduces a novel technique for estimating the signal power spectral density to be used in the transfer function of a microphone array post-filter. The technique is a generalization of the existing Zelinski post-filter, which uses the auto- and cross-spectral densities of the array inputs to estimate the signal and noise spectral densities. The Zelinski technique, however, assumes zero cross-correlation between the noise on different sensors. This assumption is inaccurate, particularly at low frequencies, and for arrays with closely spaced sensors, and thus the corresponding post-filter is suboptimal in realistic noise conditions. In this paper, a more general expression of the post-filter estimation is developed based on an assumed knowledge of the complex coherence of the noise field. This general expression can be used to construct a more appropriate post-filter in a variety of different noise fields. In experiments using real noise recordings from a computer office, the modified post-filter results in significant improvement in terms of objective speech quality measures and speech recognition performance using a diffuse noise model.
The tangent line method (TLM) was originally proposed for loudspeaker arrays to generate curvilinear acoustic beams. In this study, the TLM was applied to a microphone array. Based on reciprocity, the TLM-based microp...
详细信息
The tangent line method (TLM) was originally proposed for loudspeaker arrays to generate curvilinear acoustic beams. In this study, the TLM was applied to a microphone array. Based on reciprocity, the TLM-based microphone array can be used to form curvilinear beams. A curvilinear beam is produced as an envelope for the tangent lines. Tangent lines, which are straight beams with different angles, are generated by applying a delay-and-sum (DAS) beamformer. Because the envelope length is specified, the distance discrimination in the sensitivity is better using the TLM than the DAS beamformer. Case studies have indicated that directivity is better in the former TLM than in the latter. The TLM is realizable with fixed delay times for each microphone unless the formation of curvilinear trajectory is altered according to reproduction frequencies. Hence, the same simplicity of implementing the DAS beamformer can be achieved by optimizing the curvilinear trajectory based average frequency. Optimization is conducted such that the acoustic contrast between the focal point and elsewhere is maximized. In summary, the frequency-averaged optimal TLM can be a fixed beamformer with better performance than and the same simplicity as the DAS beamformer.
For array-based acoustic source enhancement, variants of multi-channel Wiener filters are commonly used. The approach includes a Wiener post-filter that requires the simultaneous estimation of the power spectral densi...
详细信息
For array-based acoustic source enhancement, variants of multi-channel Wiener filters are commonly used. The approach includes a Wiener post-filter that requires the simultaneous estimation of the power spectral density (PSD) of the target source and of noise sources for each time-frame. Conventional methods generally do not exploit prior knowledge, such as sparsity of the source, in solving this simultaneous estimation problem. We show that, for common scenarios, the simultaneous PSD estimation with consideration of prior knowledge can be formulated as a convex optimization problem with linear constraints. We use monotone operator splitting (MOS) to solve the constrained optimization problem. Our experiments confirm that the proposed method improves the accuracy of the noise PSD estimation, and that the resulting enhanced target signal is of higher quality.
作者:
GRENIER, YENST
DEPT SIGNAL 46 RUE BARRAULT F-75634 PARIS 13 FRANCE
This paper describes a microphone array for speech recording in car environments. The array is designed for hands-free radiotelephone, and is also used as a front-end for an automatic speech recognition system (this s...
详细信息
This paper describes a microphone array for speech recording in car environments. The array is designed for hands-free radiotelephone, and is also used as a front-end for an automatic speech recognition system (this study has been realised within the european ESPRIT project ARS ''adverse environment recognition of speech''). We first summarise the adaptive beamforming techniques that we have used. We then describe several aspects of the implementation of the array (configuration, design of fixed beamformers, adaptation, complexity reduction). In the last section, we evaluate the performance of the array. Two measures of performance have been retained, one is the signal-to-noise ratio, and the other is the score obtained with the speech recognition system.
Robust distant speech recognition (DSR) is necessary in many speech technology applications using multiple microphones but has received only limited treatment in the literature. In this paper, we work on communicating...
详细信息
Robust distant speech recognition (DSR) is necessary in many speech technology applications using multiple microphones but has received only limited treatment in the literature. In this paper, we work on communicating with vehicle voice-controlled system which is one of the applications of DSR. Two approaches for DSR are i) signal-level combination using beamforming followed by automatic speech recognition (ASR), and ii) word hypothesis-level combination using several speech recognition engines followed by confusion network combination or followed by recognizer output voting error reduction (ROVER). In addition to these approaches, it is possible to examine traininglevel combination by training the recognizer on audio signals from multiple channels (microphones). In this paper, the authors investigate how these methods can be leveraged for in-vehicle ACR using the CU-Move corpus. The authors propose various combinations of these three methods to find an optimum structure for in-vehicle ACR. The authors also investigate the effect of speaker adaptation (SA). The author's experience shows that applying SA on individual channels and merging the results with ROVER reduces the negative effects of SA reported by others in the field, and illustrates the overall improvement obtained with front-end enhancement techniques in DSR.
暂无评论