A challenge for speech recognition for voice-controlled household devices, like the Amazon Echo or Google Home, is robustness against interfering background speech. Formulated as a far-field speech recognition problem...
详细信息
ISBN:
(纸本)9781510848764
A challenge for speech recognition for voice-controlled household devices, like the Amazon Echo or Google Home, is robustness against interfering background speech. Formulated as a far-field speech recognition problem. another person or media device in proximity can produce background speech that can interfere with the device-directed speech. We expand on our previous work on device-directed speech detection in the far-field speech setting and introduce two approaches for robust acoustic modeling. Both methods are based on the idea of using an anchor word taken from the device directed speech. Our first method employs a simple yet effective normalization of the acoustic features by subtracting the mean derived over the anchor word. The second method utilizes an encodernetwork projecting the anchor word onto a fixed-size embedding. which serves as an additional input to the acoustic model. The encodernetwork and acoustic model are jointly trained. Results on an in-house dataset reveal that, in the presence of background speech, the proposed approaches can achieve up to 35% relative word error rate reduction.
The analysis of glandular morphology is a crucial step to determine the presence and grade of cancer. The rise of computational pathology has led to the development of automated segmentation to overcome the time-consu...
详细信息
ISBN:
(纸本)9781728162157
The analysis of glandular morphology is a crucial step to determine the presence and grade of cancer. The rise of computational pathology has led to the development of automated segmentation to overcome the time-consuming manual segmentation. Although the existing encoder-decoder networks haved made significant progress, the downsample operation causes fine-grain information loss. It deteriorates boundaries' localization especially in malignant cases. In this paper, we propose a maximal information complemented refinement network based on UNet. We extend the skip connection with two information complement, aggregate spatial detail information by reuse low-level features, and introduce semantic information by high-level feature guidance. Besides, a weighted cross-entropy loss and generalized dice loss is used to tackle the fuzzy boundary and class imbalance. We evaluated our model against a dozen recent deep learning models on the 2015 MICCAI Gland Segmentation challenge (GlaS) dataset. Extensive experiments show that our proposal achieves the best overall performance, immensely improves the performance of malignant cases.
In this work, an improved end-to-end U-Net structure, a hierarchical multi-scale interconnection network (HMINet), is proposed to make full use of the information contained in different feature maps in encoders and de...
详细信息
ISBN:
(纸本)9783031500787;9783031500770
In this work, an improved end-to-end U-Net structure, a hierarchical multi-scale interconnection network (HMINet), is proposed to make full use of the information contained in different feature maps in encoders and decoders to improve the accuracy of medical image segmentation. The network consists of two main components: a multi-scale fusion unit (MSF) and a multi-head feature enhancement unit (MFE). In the encoder part, the multi-scale fusion unit is used to fuse the information between the feature maps of different scales. By using convolution at different levels, a wider range of context information can be captured and fused into a more comprehensive representation of features. In the decoder part, multiple feature enhancement units can fully pay attention to the coordinates and channel information between feature maps, and then splice the encoded feature maps step by step to maximize the use of information from different feature maps. These feature maps are joined by a well-designed skip connection mechanism to retain more feature information and minimize information loss. The proposed method is tested on four public medical datasets and compared with other classical image segmentation models. The results show that HMINet can significantly improve the accuracy of medical image segmentation tasks and exceed the performance of other models in most cases.
Semantic segmentation is considered to be one of the basic steps in understanding image content. For semantic segmentation, if multi-spectral images are used together with color images, more successful results are obt...
详细信息
ISBN:
(纸本)9781665436496
Semantic segmentation is considered to be one of the basic steps in understanding image content. For semantic segmentation, if multi-spectral images are used together with color images, more successful results are obtained due to complementary information obtained from multi-spectral images. In this paper, a semantic segmentation method was developed in which the images obtained from CCD and thermal sensors were used together. In the proposed method, convolutional neural networks were used in encoder-decoder architecture. The experiments carried out show that the developed method produces better numerical and visual results than the works published in the literature.
SAR is usually subject to strong electromagnetic interference (EMI) during electronic reconnaissance missions, which will seriously weaken its ability of surveying and mapping. This paper presents a novel method for S...
详细信息
ISBN:
(数字)9781665427920
ISBN:
(纸本)9781665427920
SAR is usually subject to strong electromagnetic interference (EMI) during electronic reconnaissance missions, which will seriously weaken its ability of surveying and mapping. This paper presents a novel method for SAR image interference suppression based on the encoder-decoder network (named as ISEDnet). ISEDnet mainly consists of consecutive feature extraction net (FEN), the additional encoder-decoder network, and the image supervision mechanism. FEN is used to extract the features of interfered SAR images, and the encoder-decoder network (EDN) is used to suppress interference of SAR images. The image supervision mechanism is proposed to recover the target features. The network trained with simulation and real measurement data, the effectiveness of ISEDnet are verified by both simulation and the Sentinel-1 satellite SAR images. Compared to the traditional notch filtering method, ISEDnet can successfully suppress different types of SAR interference and improve interference suppression performance.
Recently we proposed a novel multichannel end-to-end speech recognition architecture that integrates the components of multichannel speech enhancement and speech recognition into a single neural-network-based architec...
详细信息
ISBN:
(纸本)9781509063413
Recently we proposed a novel multichannel end-to-end speech recognition architecture that integrates the components of multichannel speech enhancement and speech recognition into a single neural-network-based architecture and demonstrated its fundamental utility for automatic speech recognition (ASR). However, the behavior of the proposed integrated system remains insufficiently clarified. An open question is whether the speech enhancement component really gains speech enhancement (noise suppression) ability, because it is optimized based on end-to-end ASR objectives instead of speech enhancement objectives. In this paper, we solve this question by conducting systematic evaluation experiments using the CHiME-4 corpus. We first show that the integrated end-to-end architecture successfully obtains adequate speech enhancement ability that is superior to that of a conventional alternative (a delay-and-sum beamformer) by observing two signal-level measures: the signal-to-distortion ratio and the perceptual evaluation of speech quality. Our findings suggest that to further increase the performances of an integrated system, we must boost the power of the latter-stage speech recognition component. However, an insufficient amount of multichannel noisy speech data is available. Based on these situations, we next investigate the effect of using a large amount of single-channel clean speech data, e.g., the WSJ corpus, for additional training of the speech recognition component. We also show that our approach with clean speech significantly improves the total performance of multichannel end-to-end architecture in the multichannel noisy ASR tasks.
The matched filter is widely used in traditional synthetic aperture radar system with the criteria of maximum output signal-noise-ratio (SNR). However, matched filter has high-level side lobes (for example, 13.6dB wit...
详细信息
ISBN:
(纸本)9781728173337
The matched filter is widely used in traditional synthetic aperture radar system with the criteria of maximum output signal-noise-ratio (SNR). However, matched filter has high-level side lobes (for example, 13.6dB without windowing) resulting in severe masking for adjacent weak targets. Recently, an adaptive pulse compression (APC) was developed based on minimum mean square error (MMSE) between true ground profiles and compressed outputs. Though has better compression results, it requires tremendous amount of time for processing, which hinders its applications. In this paper, a fast implementation named iterative adaptive pulse compression (IAPC), is designed whereby the respective tap coefficients of all filters are jointly updated with a recursive way and subsequently we incorporate it into the classic range Doppler algorithm via changing the processing sequence, which is shown to be obviously efficient with pulse duration increasing. Moreover, an encoder-decoder architecture is developed for improving the targets' energy to solve the SNR-dependence of both APC and IAPC. As a result, through various stressing experiments including simulation and real data validation, the proposed method is shown to be superior to the original APC and matched filter in the views of time efficiency and peak sidelobe ratio (PSLR) respectively.
Handwritten mathematical expression recognition (HMER) is an important research direction in handwriting recognition. The performance of HMER suffers from the two-dimensional structure of mathematical expressions (MEs...
详细信息
ISBN:
(纸本)9781728199665
Handwritten mathematical expression recognition (HMER) is an important research direction in handwriting recognition. The performance of HMER suffers from the two-dimensional structure of mathematical expressions (MEs). To address this issue, in this paper, we propose a high-performance HMER model with scale augmentation and drop attention. Specifically, tackling ME with unstable scale in both horizontal and vertical directions, scale augmentation improves the performance of the model on MEs of various scales. An attention-based encoder-decoder network is used for extracting features and generating predictions. In addition, drop attention is proposed to further improve performance when the attention distribution of the decoder is not precise. Compared with previous methods, our method achieves state-of-the-art performance on two public datasets of CROHME 2014 and CROHME 2016.
Doppler weather radar is the most widely used convection detector with the highest resolution in the ground. Echo reflectance data from the weather radar is the key reference for the meteorological department to carry...
详细信息
ISBN:
(纸本)9781665421744
Doppler weather radar is the most widely used convection detector with the highest resolution in the ground. Echo reflectance data from the weather radar is the key reference for the meteorological department to carry out severe convective weather forecast and early warning, quantitative precipitation estimation(QPE) and quantitative precipitation forecast(QPF). However, in the process of radar detection, it is inevitable to be affected by obstacles, ground object echo interference, radar echo attenuation and other phenomena, resulting in poor data quality of detection results. Therefore, it is very important to correct the missing or disturbed data. On the other hand, with the rapid development of artificial intelligence technology in recent years, more and more meteorological researchers begin to introduce deep learning and other machine learning methods into the research of meteorological field such as weather radar data processing. In this paper, a deep convolutional encoderdecodernetwork is proposed to correct the beam blocking of weather radar. In this study, the correction of radar beam blockage is regarded as an image inpainting problem. It's the first trying to use deep learning to realize the correction of radar beam blockage. Experiment shows that the method proposed in this paper is significantly better than the traditional method in accuracy, error rate, false alarm rate and other aspects. The method can directly identify and correct the blocking area, and the operation procedure is simple compared traditional methods.
Images taken through transparent glass are usually covered with unwanted reflective layers, which can affect the effects of many computer vision tasks, so eliminating these reflections has a wide range of applications...
详细信息
ISBN:
(纸本)9781728142425
Images taken through transparent glass are usually covered with unwanted reflective layers, which can affect the effects of many computer vision tasks, so eliminating these reflections has a wide range of applications in image processing studies. In fact, the areas in the reflective layer that have a great impact on the image are often those with high reflectivity. Therefore, in this paper, we propose a deep learning model that can remove the reflection from a single image. The model can automatically detect the reflection area and obtains a reflective-free background image. After extensive comparative experiments and validation, the model shows excellent performance.
暂无评论