Nonlinear spectral mapping-based models based on supervised learning have successfully applied for speech enhancement. However, as supervised learning approaches, a large amount of labelled data (noisy-clean speech pa...
详细信息
ISBN:
(纸本)9781509066315
Nonlinear spectral mapping-based models based on supervised learning have successfully applied for speech enhancement. However, as supervised learning approaches, a large amount of labelled data (noisy-clean speech pairs) should be provided to train those models. In addition, their performances for unseen noisy conditions are not guaranteed, which is a common weak point of supervised learning approaches. In this study, we proposed an unsupervised learning approach for speech enhancement, i.e., denoisingautoencoder with linear regression decoder (DAELD) model for speech enhancement. The DAELD is trained with noisy speech as both input and target output in a self-supervised learning manner. In addition, with properly setting a shrinkage threshold for internal hidden representations, noise could be removed during the reconstruction from the hidden representations via the linear regression decoder. Speech enhancement experiments were carried out to test the proposed model. Results confirmed that the proposed DAELD could achieve comparable and sometimes even better enhancement performance as compared to the conventional supervised speech enhancement approaches, in both seen and unseen noise environments. Moreover, we observe that higher performances tend to achieve by DAELD when the training data cover more diverse noise types and signal-to-noise-ratio (SNR) levels.
This paper compares unsupervised sequence training techniques for deep neural networks (DNN) for broadcast transcriptions. Recent progress in digital archiving of broadcast content has made it easier to access large a...
详细信息
ISBN:
(纸本)9781479999880
This paper compares unsupervised sequence training techniques for deep neural networks (DNN) for broadcast transcriptions. Recent progress in digital archiving of broadcast content has made it easier to access large amounts of speech data. Such archived data will be helpful for acoustic/language modeling in live-broadcast captioning based on automatic speech recognition (ASR). In Japanese broadcasts, however, archived programs, e.g., sports news, do not always have closed-captions used typically as references. Thus, unsupervised adaptation techniques are needed for performance improvements even when a DNN is used as an acoustic model. In this paper, we compared three unsupervised sequence adaptation techniques: maximum a posteriori (MAP), entropy minimization, and Bayes risk minimization. Experimental results for transcribing sports news programs showed that the best ASR performance is brought about by Bayes risk minimization which reflects information as to expected errors, while comparable results are obtained with MAP, the simplest way of unsupervised sequence adaptation.
Objective: With the rapid growth of high-speed deep-tissue imaging in biomedical research, there is an urgent need to develop a robust and effective denoising method to retain morphological features for further textur...
详细信息
Objective: With the rapid growth of high-speed deep-tissue imaging in biomedical research, there is an urgent need to develop a robust and effective denoising method to retain morphological features for further texture analysis and segmentation. Conventional denoising filters and models can easily suppress the perturbative noise in high-contrast images;however, for low photon budget multiphoton images, a high detector gain will not only boost the signals but also bring significant background noise. In such a stochastic resonance imaging regime, subthreshold signals may be detectable with the help of noise, meaning that a denoising filter capable of removing noise without sacrificing important cellular features, such as cell boundaries, is desirable. Method: We propose a convolutional neural network-based denoisingautoencoder method - a fully convolutional deep denoising autoencoder (DDAE) - to improve the quality of three-photon fluorescence (3PF) and third-harmonic generation (THG) microscopy images. Results: The average of 200 acquired images of a given location served as the low-noise answer for the DDAE training. Compared with other conventional denoising methods, our DDAE model shows a better signal-to-noise ratio (28.86 and 21.66 for 3PF and THG, respectively), structural similarity (0.89 and 0.70 for 3PF and THG, respectively), and preservation of the nuclear or cellular boundaries (F1-score of 0.662 and 0.736 for 3PF and THG, respectively). It shows that DDAE is a better trade-off approach between structural similarity and preserving signal regions. Conclusions: The results of this study validate the effectiveness of the DDAE system in boundary-preserved image denoising. Clinical Impact: The proposed deepdenoising system can enhance the quality of microscopic images and effectively support clinical evaluation and assessment.
Reverberation, which is generally caused by sound reflections from walls, ceilings, and floors, can result in severe performance degradation of acoustic applications. Due to a complicated combination of attenuation an...
详细信息
ISBN:
(纸本)9781538646595
Reverberation, which is generally caused by sound reflections from walls, ceilings, and floors, can result in severe performance degradation of acoustic applications. Due to a complicated combination of attenuation and time-delay effects, the reverberation property is difficult to characterize, and it remains a challenging task to effectively retrieve the anechoic speech signals from reverberation ones. In the present study, we proposed a novel integrated deep and ensemble learning algorithm (IDEA) for speech dereverberation. The IDEA consists of offline and online phases. In the offline phase, we train multiple dereverberation models, each aiming to precisely dereverb speech signals in a particular acoustic environment;then a unified fusion function is estimated that aims to integrate the information of multiple dereverberation models. In the online phase, an input utterance is first processed by each of the dereverberation models. The outputs of all models are integrated accordingly to generate the final anechoic signal. We evaluated the IDEA on designed acoustic environments, including both matched and mismatched conditions of the training and testing data. Experimental results confirm that the proposed IDEA outperforms single deep-neural-network-based dereverberation model with the same model architecture and training data.
This paper compares unsupervised sequence training techniques for deep neural networks (DNN) for broadcast transcriptions. Recent progress in digital archiving of broadcast content has made it easier to access large a...
详细信息
ISBN:
(纸本)9781479999897
This paper compares unsupervised sequence training techniques for deep neural networks (DNN) for broadcast transcriptions. Recent progress in digital archiving of broadcast content has made it easier to access large amounts of speech data. Such archived data will be helpful for acoustic/language modeling in live-broadcast captioning based on automatic speech recognition (ASR). In Japanese broadcasts, however, archived programs, e.g., sports news, do not always have closed-captions used typically as references. Thus, unsupervised adaptation techniques are needed for performance improvements even when a DNN is used as an acoustic model. In this paper, we compared three unsupervised sequence adaptation techniques: maximum a posteriori (MAP), entropy minimization, and Bayes risk minimization. Experimental results for transcribing sports news programs showed that the best ASR performance is brought about by Bayes risk minimization which reflects information as to expected errors, while comparable results are obtained with MAP, the simplest way of unsupervised sequence adaptation.
暂无评论