In this paper we present an approach to training neural network to generate sequences using successor feature learning from reinforcement learning. The model can be thought as two components, an MLE-based token genera...
详细信息
In this paper we present an approach to training neural network to generate sequences using successor feature learning from reinforcement learning. The model can be thought as two components, an MLE-based token generator and an estimator that predicts the future value of whole sentence. As we know, reinforcement learning has been applied to dealing with the exposure bias problem of generating sequences. Compared with other RL algorithm, successor feature(SF) can learn robust value function provided observations and reward by decomposing the value function into two components - a reward predictor and a successor map. The encoder-decoder framework with SF enables the decoder to generate outputs that receive more future reward, which means that the model pays attention on not only the current word but also the rest words. We demonstrate that the approach improves performance on two translation tasks. (C) 2019 Elsevier B.V. All rights reserved.
While traditional cartoon character drawings are simple for humans to create, it remains a highly challenging task for machines to interpret. Parsing is a way to alleviate the issue with fine-grained semantic segmenta...
详细信息
While traditional cartoon character drawings are simple for humans to create, it remains a highly challenging task for machines to interpret. Parsing is a way to alleviate the issue with fine-grained semantic segmentation of images. Although well studied on naturalistic images, research toward cartoon parsing is very sparse. Due to the lack of available dataset and the diversity of artwork styles, the difficulty of the cartoon character parsing task is greater than the well-known human parsing task. In this paper, we study one type of cartoon instance: cartoon dogs. We introduce a novel dataset toward cartoon dog parsing and create a new deep convolutional neural network (DCNN) to tackle the problem. Our dataset contains 965 precisely annotated cartoon dog images with seven semantic part labels. Our new model, called dense feature pyramid network (DFPnet), makes use of recent popular techniques on semantic segmentation to efficiently handle cartoon dog parsing. We achieve a mIoU of 68.39%, a Mean Accuracy of 79.4% and a Pixel Accuracy of 93.5% on our cartoon dog validation set. Our method outperforms state-of-the-art models of similar tasks trained on our dataset: CE2P for single human parsing and Mask R-CNN for instance segmentation. We hope this work can be used as a starting point for future research toward digital artwork understanding with DCNN. Our DFPnet and dataset will be publicly available.
Dynamic scene deblurring is a significant technique in the field of computer vision. The multi-scale strategy has been successfully extended to the deep end-to-end learning-based deblurring task. Its expensive computa...
详细信息
Dynamic scene deblurring is a significant technique in the field of computer vision. The multi-scale strategy has been successfully extended to the deep end-to-end learning-based deblurring task. Its expensive computation gives birth to the multi-patch framework. The success of the multi-patch framework benefits from the local residual information passed across the hierarchy. One problem is that the finest levels rarely contribute to their residuals so that the contributions of the finest levels to their residuals are excluded by coarser levels, which limits the deblurring performance. To this end, we substitute the nested module blocks, whose powerful and complex representation ability is utilized to improve the deblurring performance, for the building blocks of the encoder-decoders in the multi-patch network. Additionally, the attention mechanism is introduced to enable the network to differentiate blur across the whole blurry image from dynamic scene, thereby further improving the ability to handle the motion object blur. Our modification boosts the contributions of the finest levels to their residuals and enables the network to learn different weights for feature information extracted from spatially-varying blur image. Extensive experiments show that the improved network achieves competitive performance on the GoPro dataset according to PSNR and SSIM.
Generative text summary is an important branch of natural language processing. Aiming at the problems of insufficient use of semantic information, insufficient summary precision and the problem of semantics-loss in th...
详细信息
Generative text summary is an important branch of natural language processing. Aiming at the problems of insufficient use of semantic information, insufficient summary precision and the problem of semantics-loss in the current generated text summary method, an enhanced semantic model is proposed based on dual-encoder, which can provide richer semantic information for sequence-to-sequence architecture through dual-encoder. The enhanced attention architecture with dual-channel semantics is optimized, and the empirical distribution and Gain-Benefit gate are built for decoding. In addition, the position embedding and word embedding are merged into the word embedding technology, and the TF-IDF(term frequency-inverse document frequency), part of speech, key score are added to word's feature. Meanwhile, the optimal dimension of word embedding is optimized. This paper aims to optimize the traditional sequence mapping and word feature representation, enhance the model's semantic understanding, and improve the quality of the summary. The LCSTS and SOGOU datasets are used to validate proposed method. The experimental results show that the proposed method can improve the performance of the ROUGE evaluation system by 10-13 percentage points compared with other listed algorithms. We can observe that the semantic understanding of the text summaries is more accurate and the generation effect is better, which has a better application prospect.
The research of video-based sports movement analysis technology has an important application value. The introduction of digital video, human-computer interaction and other technologies in sports training can greatly i...
详细信息
The research of video-based sports movement analysis technology has an important application value. The introduction of digital video, human-computer interaction and other technologies in sports training can greatly improve training efficiency. This paper studies the technical characteristics of the players in the basketball game video and proposes a behavior analysis method based on deep learning. We first design a method to automatically extract the basketball court and stadium marking line. Subsequently, key frames in the video are captured using a spatiotemporal scoring mechanism. Afterward, we develop a behavior recognition and prediction method based on an encoder-decoder framework. The analysis results can be fed back to coaches and data analysts in real-time to help them analyze the tactics and technical choices. Experiments on the proposed method are carried out on a large basketball video dataset. The results show that the proposed method can effectively identify the motion of video characters while achieving high behavior analysis accuracy.
In this paper, we present an effective encoder-decoder design utilizing Flexible Cross Correlation (FCC) code for Spectral Amplitude Coding-Optical Code Division Multiple Access (SAC-OCDMA) systems. The FCC code has a...
详细信息
ISBN:
(纸本)9781467360739;9781467360753
In this paper, we present an effective encoder-decoder design utilizing Flexible Cross Correlation (FCC) code for Spectral Amplitude Coding-Optical Code Division Multiple Access (SAC-OCDMA) systems. The FCC code has advantages, such as flexibility cross-correlation property for any given number of users and weights, as well as effectively reduces the impacts of Multiple-Access Interference (MAI). The proposed FCC SAC-OCDMA encoder-decoder had shown superior performance indicated FCC SAC-OCDMA encoder-decoder offers 100%, 287% and 331% much larger number of active users compared to MDW K=60, MFH K=31 and Hadamard K=29, respectively.
Seismic image interpretation is indispensable for oil and gas industry. Currently, artificial intelligence has been undertaken to increase the level of confidence in exploratory activities. Detecting potentially recov...
详细信息
Seismic image interpretation is indispensable for oil and gas industry. Currently, artificial intelligence has been undertaken to increase the level of confidence in exploratory activities. Detecting potentially recoverable hydrocarbon zones (leads) under the viewpoint of computer vision is an emerging problem that demands thorough examination. This paper introduces a processing workflow to recognize geologic leads in seismic images that resorts to encoder-decoder architectures of a convolutional neural network (CNN) accompanied by segmentation maps and post-processing operations. We have used seismic images collected at offshore sites of the Sergipe-Alagoas Basin (northeast of Brazil) as input. After performing a patch-based data augmentation, a total of 29600 patches were achieved. Out of these, 24000 were used for training, 5000 for validation, and 600 for testing. Each image generated for the training set was post-processed through reconstruction, thresholding & x2013;binarization and deblurring & x2013;, and outlier removal. By using the dice loss function, intersection-over-union index, and relative areal residual computed after intense cross-validation training rounds, we have shown that the accuracy of the network to detect leads was higher than 80 & x0025;. Furthermore, the validation error limits were found stable within 5 & x0025;- 10 & x0025;in all validation rounds, thereby resulting in a fairly accurate prediction of the pre-labelled hydrocarbon spots.
The encoder-decoder framework has been widely used for video captioning to achieve promising results, and various attention mechanisms are proposed to further improve the performance. While temporal attention determin...
详细信息
The encoder-decoder framework has been widely used for video captioning to achieve promising results, and various attention mechanisms are proposed to further improve the performance. While temporal attention determines where to look, semantic decides the context. However, the combination of semantic and temporal attention has never be exploited for video captioning. To tackle this issue, we propose an end-to-end pipeline named Fused GRU with Semantic-Temporal Attention (STA-FG), which can explicitly incorporate the high-level visual concepts to the generation of semantic-temporal attention for video captioning. The encoder network aims to extract visual features from the videos and predict their semantic concepts, while the decoder network is focusing on efficiently generating coherent sentences using both visual features and semantic concepts. Specifically, the decoder combines both visual and semantic representation, and incorporates a semantic and temporal attention mechanism in a fused GRU network to accurately learn the sentences for video captioning. We experimentally evaluate our approach on the two prevalent datasets MSVD and MSR-VTT, and the results show that our STA-FG achieves the currently best performance on both BLEU and METEOR. (C) 2019 Elsevier B.V. All rights reserved.
Magnetic Resonance Images (MRI) are often contaminated by rician noise at the acquisition time. This type of noise typically deteriorates the performance of disease diagnosis by a human observer or an automated system...
详细信息
Magnetic Resonance Images (MRI) are often contaminated by rician noise at the acquisition time. This type of noise typically deteriorates the performance of disease diagnosis by a human observer or an automated system. Thus, it is necessary to remove the rician noise from MRI scans as a preprocessing step. In this letter, we propose a novel Convolutional Neural Network (CNN), viz. CNN-DMRI, for denoising of MRI scans. The network uses a set of convolutions to separate the image features from the noise. The network also employs encoder-decoder structure for preserving the prominent features of the image while ignoring unnecessary ones. The training of the network is carried out in an end-to-end way by utilizing residual learning scheme. The performance of the proposed CNN has been tested qualitatively and quantitatively on one simulated and four real MRI datasets. Extensive experimental findings suggest that the proposed network can denoise MRI images effectively without losing crucial image details. (C) 2020 Elsevier B.V. All rights reserved.
Skin Lesion detection and classification are very critical in diagnosing skin malignancy. Existing Deep learning-based Computer-aided diagnosis (CAD) methods still perform poorly on challenging skin lesions with compl...
详细信息
Skin Lesion detection and classification are very critical in diagnosing skin malignancy. Existing Deep learning-based Computer-aided diagnosis (CAD) methods still perform poorly on challenging skin lesions with complex features such as fuzzy boundaries, artifacts presence, low contrast with the background and, limited training datasets. They also rely heavily on a suitable turning of millions of parameters which often leads to over-fitting, poor generalization, and heavy consumption of computing resources. This study proposes a new framework that performs both segmentation and classification of skin lesions for automated detection of skin cancer. The proposed framework consists of two stages: the first stage leverages on an encoder-decoder Fully Convolutional Network (FCN) to learn the complex and inhomogeneous skin lesion features with the encoder stage learning the coarse appearance and the decoder learning the lesion borders details. Our FCN is designed with the sub-networks connected through a series of skip pathways that incorporate long skip and short-cut connections unlike, the only long skip connections commonly used in the traditional FCN, for residual learning strategy and effective training. The network also integrates the Conditional Random Field (CRF) module which employs a linear combination of Gaussian kernels for its pairwise edge potentials for contour refinement and lesion boundaries localization. The second stage proposes a novel FCN-based DenseNet framework that is composed of dense blocks that are merged and connected via the concatenation strategy and transition layer. The system also employs hyper-parameters optimization techniques to reduce network complexity and improve computing efficiency. This approach encourages feature reuse and thus requires a small number of parameters and effective with limited data. The proposed model was evaluated on publicly available HAM10000 dataset of over 10000 images consisting of 7 different categories of d
暂无评论