Semantic segmentation is considered to be one of the basic steps in understanding image content. For semantic segmentation, if multi-spectral images are used together with color images, more successful results are obt...
详细信息
ISBN:
(纸本)9781665436496
Semantic segmentation is considered to be one of the basic steps in understanding image content. For semantic segmentation, if multi-spectral images are used together with color images, more successful results are obtained due to complementary information obtained from multi-spectral images. In this paper, a semantic segmentation method was developed in which the images obtained from CCD and thermal sensors were used together. In the proposed method, convolutional neural networks were used in encoder-decoder architecture. The experiments carried out show that the developed method produces better numerical and visual results than the works published in the literature.
Recently we proposed a novel multichannel end-to-end speech recognition architecture that integrates the components of multichannel speech enhancement and speech recognition into a single neural-network-based architec...
详细信息
ISBN:
(纸本)9781509063413
Recently we proposed a novel multichannel end-to-end speech recognition architecture that integrates the components of multichannel speech enhancement and speech recognition into a single neural-network-based architecture and demonstrated its fundamental utility for automatic speech recognition (ASR). However, the behavior of the proposed integrated system remains insufficiently clarified. An open question is whether the speech enhancement component really gains speech enhancement (noise suppression) ability, because it is optimized based on end-to-end ASR objectives instead of speech enhancement objectives. In this paper, we solve this question by conducting systematic evaluation experiments using the CHiME-4 corpus. We first show that the integrated end-to-end architecture successfully obtains adequate speech enhancement ability that is superior to that of a conventional alternative (a delay-and-sum beamformer) by observing two signal-level measures: the signal-to-distortion ratio and the perceptual evaluation of speech quality. Our findings suggest that to further increase the performances of an integrated system, we must boost the power of the latter-stage speech recognition component. However, an insufficient amount of multichannel noisy speech data is available. Based on these situations, we next investigate the effect of using a large amount of single-channel clean speech data, e.g., the WSJ corpus, for additional training of the speech recognition component. We also show that our approach with clean speech significantly improves the total performance of multichannel end-to-end architecture in the multichannel noisy ASR tasks.
Single blind image deblurring caused by a combination of multiple factors has been one of the most challenging visual tasks. Recently, many essential methods of this task are based on deep learning networks and have a...
详细信息
ISBN:
(纸本)9781728176055
Single blind image deblurring caused by a combination of multiple factors has been one of the most challenging visual tasks. Recently, many essential methods of this task are based on deep learning networks and have achieved high performance. However, most of them only apply norm pixel-wise L1-loss function as the guide of training, which is not suitable or effective enough. In this paper, we propose Multiple Auxiliary networks (MANet) for single blind image deblurring to assist norm L1-loss function and enhance the quality of the deblurring image. The main branch of our MANet is an encoder-decoder structure made up of residual blocks, and the three auxiliary branches are the edge prediction branch, the multi-scale refinement branch, and the perceptual loss branch. The experimental results demonstrate that the proposed MANet can obtain better deblurring performance with more details than state-of-the-art methods. The code is released at ***/ZERO2ER0/MANet.
Automatic MRI brain tumor segmentation is of vital importance for the disease diagnosis, monitoring, and treatment planning. In this paper, we propose a two-stage encoder-decoder based model for brain tumor subregiona...
详细信息
ISBN:
(纸本)9783030720834;9783030720841
Automatic MRI brain tumor segmentation is of vital importance for the disease diagnosis, monitoring, and treatment planning. In this paper, we propose a two-stage encoder-decoder based model for brain tumor subregional segmentation. Variational autoencoder regularization is utilized in both stages to prevent the overfitting issue. The second-stage network adopts attention gates and is trained additionally using an expanded dataset formed by the first-stage outputs. On the BraTS 2020 validation dataset, the proposed method achieves the mean Dice score of 0.9041, 0.8350, and 0.7958, and Hausdorff distance (95%) of 4.953, 6.299, 23.608 for the whole tumor, tumor core, and enhancing tumor, respectively. The corresponding results on the BraTS 2020 testing dataset are 0.8729, 0.8357, and 0.8205 for Dice score, and 11.4288, 19.9690, and 15.6711 for Hausdorff distance. The code is publicly available at https://***/shu-hai/two-stage-VAE-Attention-gate-BraTS2020.
The recent surge in deep learning methods across multiple modalities has resulted in an increased interest in image captioning. Most advances in image captioning are still focused on the generation of factual-centric ...
详细信息
ISBN:
(纸本)9781665441155
The recent surge in deep learning methods across multiple modalities has resulted in an increased interest in image captioning. Most advances in image captioning are still focused on the generation of factual-centric captions, which mainly describe the contents of an image. However, generating captions to provide a meaningful and opinionated critique of photographs is less studied. This paper presents a framework for leveraging aesthetic features encoded from an image aesthetic scorer, to synthesize human-like textual critique via a sequence decoder. Experiments on a large-scale dataset show that the proposed method is capable of producing promising results on relevant metrics relating to semantic diversity and synonymity, with qualitative observations demonstrating likewise. We also suggest the use of Word Mover's Distance as a semantically intuitive and informative metric for this task.
As a logger of aircraft data, the black box is the most reliable and effective means of identifying the cause of an accident after an aircraft crash. An underwater acoustic beacon was installed in the black box to dea...
详细信息
ISBN:
(纸本)9781728154466
As a logger of aircraft data, the black box is the most reliable and effective means of identifying the cause of an accident after an aircraft crash. An underwater acoustic beacon was installed in the black box to deal with the black box positioning problem in the air accident at sea. The masking effect of ocean noise, coupled with the propagation loss of the ocean, causes the signal to attenuate seriously during long-distance propagation, which makes it very difficult to detect underwater signals. Inspired by the successful application of fully convolutional networks (FCN) in the field of pixel-level image classification, an encoder-decoder network with skip connnection layers, called "Unet", is proposed to enhance the underwater acoustic beacon signals represented by short-time Fourier transform (STFT) images. The experimental data show that the enhancement method based on FCN has higher signal gain than the conventional method based on adaptive line enhancer (ALE).
Dynamic scene deblurring is a challenging problem due to the various blurry source. Many deep learning based approaches try to train end-to-end deblurring networks, and achieve successful performance. However, the arc...
详细信息
ISBN:
(纸本)9781509066315
Dynamic scene deblurring is a challenging problem due to the various blurry source. Many deep learning based approaches try to train end-to-end deblurring networks, and achieve successful performance. However, the architectures and parameters of these methods are unchanged after training, so they need deeper network architectures and more parameters to adapt different blurry images, which increase the computational complexity. In this paper, we propose a local correlation block (LCBlock), which can adjust the weights of features adaptively according to the blurry inputs. And we use it to construct a dynamic scene deblurring network named LCNet. Experimental results show that the proposed LC-Net produces compariable performance with shorter running time and smaller network size, compared to state-of-the-art learning-based methods.
Recently, several Transformer-based methods have been presented to improve image segmentation. However, since Transformer needs regular square images and has difficulty in obtaining local feature information, the perf...
详细信息
ISBN:
(纸本)9781665405409
Recently, several Transformer-based methods have been presented to improve image segmentation. However, since Transformer needs regular square images and has difficulty in obtaining local feature information, the performance of image segmentation is seriously affected. In this paper, we propose a novel encoder-decoder network named TCRNet, which makes Transformer, Convolutional neural network (CNN) and Recurrent neural network (RNN) complement each other. In the encoder, we extract and concatenate the feature maps from Transformer and CNN to effectively capture global and local feature information of images. Then in the decoder, we utilize convolutional RNN in the proposed recurrent decoding unit to refine the feature maps from the decoder for finer prediction. Experimental results on three medical datasets demonstrate that TCRNet effectively improves the segmentation precision.
In this paper, we present a data-driven approach to construct a reduced-order model (ROM) for the unsteady flow field and fluid-structure interaction. This proposed approach relies on (i) a projection of the high-dime...
详细信息
ISBN:
(纸本)9780791858776
In this paper, we present a data-driven approach to construct a reduced-order model (ROM) for the unsteady flow field and fluid-structure interaction. This proposed approach relies on (i) a projection of the high-dimensional data from the Navier-Stokes equations to a low-dimensional subspace using the proper orthogonal decomposition (POD) and (ii) integration of the lowdimensional model with the recurrent neural networks. For the hybrid ROM formulation, we consider long short term memory networks with encoder-decoder architecture, which is a special variant of recurrent neural networks. The mathematical structure of recurrent neural networks embodies a non-linear state space form of the underlying dynamical behavior. This particular attribute of an RNN makes it suitable for non-linear unsteady flow problems. In the proposed hybrid RNN method, the spatial and temporal features of the unsteady flow system are captured separately. Time-invariant modes obtained by low-order projection embodies the spatial features of the flow field, while the tempo ral behavior of the corresponding modal coefficients is learned via recurrent neural networks. The effectiveness of the proposed method is first demonstrated on a canonical problem of flow past a cylinder at low Reynolds number. With regard to a practical marine/offshore engineering demonstration, we have applied and examined the reliability of the proposed data-driven framework for the predictions of vortex-induced vibrations of a flexible offshore riser at high Reynolds number.
Image inpainting has made significant progress benefiting from the advantages of convolutional neural networks (CNNs). Deep learning-based methods have shown extraordinary performance in this field. In this paper, we ...
详细信息
ISBN:
(纸本)9781728198354
Image inpainting has made significant progress benefiting from the advantages of convolutional neural networks (CNNs). Deep learning-based methods have shown extraordinary performance in this field. In this paper, we propose a novel image inpainting architecture with pure CNN that can jointly reconstruct the structure and texture of the image. Our generative network architecture (TSFC) consists of two parallel stages: structure generation and texture generation. In the structure generation stage, we use the large convolution kernel, which is highly neglected in modern networks, using the effective perceptual field of the large convolution kernel to enhance the perception of overall structural features. In the texture generation stage, we use the small convolution kernel to extract local texture features. Qualitative and quantitative experimental results on CelebA-HQ and Paris Street View datasets demonstrate the effectiveness and superiority of our method.
暂无评论