Inverting seismic data to build 3D geological structures is a challenging task due to the overwhelming amount of acquired seismic data, and the very-high computational load due to iterative numerical solutions of the ...
详细信息
Inverting seismic data to build 3D geological structures is a challenging task due to the overwhelming amount of acquired seismic data, and the very-high computational load due to iterative numerical solutions of the wave equation, as required by industry-standard tools such as Full Waveform Inversion (FWI). For example, in an area with surface dimensions of 4.5 km x 4.5 km, hundreds of seismic shot-gather cubes are required for 3D model reconstruction, leading to Terabytes of recorded data. This paper presents a deep learning solution for the reconstruction of realistic 3D models in the presence of field noise recorded in seismic surveys. We implement and analyze a convolutional encoder-decoder architecture that efficiently processes the entire collection of hundreds of seismic shot-gather cubes. The proposed solution demonstrates that realistic 3D models can be reconstructed with a structural similarity index measure (SSIM) of 0.9143 (out of 1.0) in the presence of field noise at 10 dB signal-to-noise ratio.
Semantic segmentation, as a dense pixelwise classification task, is of great significance to scene understanding. Many approaches based on convolutional neural network still suffer from two kinds of challenges: (1) in...
详细信息
Semantic segmentation, as a dense pixelwise classification task, is of great significance to scene understanding. Many approaches based on convolutional neural network still suffer from two kinds of challenges: (1) insufficient semantic information results in semantic obfuscation between similar categories, (2) loss of spatial information leads to inaccurate location of inconspicuous objects. To tackle these challenges, we design a network with an encoder-decoder architecture based on two proposed modules: global pyramid attention module (GPAM) and pyramid decoder module (PDM). Specifically, GPAM exploits an attention mechanism as global prior knowledge to adaptively capture discriminative features for enhancing semantic representation, and PDM employs small convolutions connected in parallel to predict adjacent position relationships for refining spatial information. A series of ablation experiments are conducted to demonstrate the effectiveness of our designs, and our network achieves a mean intersection over union score of 83.4% on PASCAL VOC 2012 dataset and 78.5% on Cityscapes dataset. (C) 2019 SPIE and IS&T
In this paper, we propose a novel multi-modal multi-task encoder-decoder pre-training framework (MMSpeech) for Mandarin automatic speech recognition (ASR), which employs both unlabeled speech and text data. The main d...
详细信息
In this paper, we propose a novel multi-modal multi-task encoder-decoder pre-training framework (MMSpeech) for Mandarin automatic speech recognition (ASR), which employs both unlabeled speech and text data. The main difficulty in speech-text joint pre-training comes from the significant difference between speech and text modalities, especially for Mandarin speech and text. Unlike English and other languages with an alphabetic writing system, Mandarin uses an ideographic writing system where character and sound are not tightly mapped to one another. Therefore, we propose to introduce the phoneme modality into pre-training, which can help capture modality-invariant information between Mandarin speech and text. In addition, a much larger amount of unsupervised text data 292G is utilized for pre-training, which brings significant improvements. Experiments on AISHELL-1 show that our proposed method achieves state-of-the-art performance, with a more than 40% relative improvement.
End-to-end training of deep learning-based models allows for implicit learning of intermediate representations based on the final task loss. However, the end-to-end approach ignores the useful domain knowledge encoded...
详细信息
ISBN:
(纸本)9781510848764
End-to-end training of deep learning-based models allows for implicit learning of intermediate representations based on the final task loss. However, the end-to-end approach ignores the useful domain knowledge encoded in explicit intermediate-level supervision. We hypothesize that using intermediate representations as auxiliary supervision at lower levels of deep networks may be a good way of combining the advantages of end-to-end training and more traditional pipeline approaches. We present experiments on conversational speech recognition where we use lower-level tasks, such as phoneme recognition, in a multitask training approach with an encoder-decoder model for direct character transcription. We compare multiple types of lower-level tasks and analyze the effects of the auxiliary tasks. Our results on the Switchboard corpus show that this approach improves recognition accuracy over a standard encoder-decoder model on the Eva12000 test set.
Colorectal cancer is the third most common cancer which causes of cancer-related deaths. Therefore, early diagnosis of polyps by colonoscopy could result in successful treatment. Diagnosis of polyps in colonoscopy vid...
详细信息
ISBN:
(纸本)9781538695555
Colorectal cancer is the third most common cancer which causes of cancer-related deaths. Therefore, early diagnosis of polyps by colonoscopy could result in successful treatment. Diagnosis of polyps in colonoscopy videos is a challenging task due to variations in the size and shape of polyps. In this paper, we propose a polyp segmentation method based on the encoderdecoder network. Performance of the method is enhanced by two strategies, we perform a novel database augmentation method for colonoscopy images in the training phase. Besides, in the test phase, we perform an effective prediction by combining multi model to compare the probability of each image that is produced by the network. Evaluation of the proposed method using the ETIS-LariPolypDB [9] database shows that our proposed method outperforms state-of-the-art results.
In the process of converting food-processing by-products to value-added ingredients, fine grained control of the raw materials, enzymes and process conditions ensures the best possible yield and economic return. Howev...
详细信息
In the process of converting food-processing by-products to value-added ingredients, fine grained control of the raw materials, enzymes and process conditions ensures the best possible yield and economic return. However, when raw material batches lack good characterization and contain high batch variation, online or at-line monitoring of the enzymatic reactions would be beneficial. We investigate the potential of deep neural networks in predicting the future state of enzymatic hydrolysis as described by Fourier-transform infrared spectra of the hydrolysates. Combined with predictions of average molecular weight, this provides a flexible and transparent tool for process monitoring and control, enabling proactive adaption of process parameters.
We present Tweet2Vec, a novel method for generating general-purpose vector representation of tweets. The model learns tweet embeddings using character-level CNN-LSTMencoder-decoder. We trained our model on 3 million, ...
详细信息
ISBN:
(纸本)9781450340694
We present Tweet2Vec, a novel method for generating general-purpose vector representation of tweets. The model learns tweet embeddings using character-level CNN-LSTMencoder-decoder. We trained our model on 3 million, randomly selected English-language tweets. The model was evaluated using two methods: tweet semantic similarity and tweet sentiment categorization, outperforming the previous state-of-the-art in both tasks. The evaluations demonstrate the power of the tweet embeddings generated by our model for various tweet categorization tasks. The vector representations generated by our model are generic, and hence can be applied to a variety of tasks. Though the model presented in this paper is trained on English-language tweets, the method presented can be used to learn tweet embeddings for different languages.
Peatland classification provides valuable information for greenhouse gas inventory and biodiversity protection. In this paper, we proposed an encoder-decoder-based architecture for peatland classification that fuses t...
详细信息
ISBN:
(纸本)9798350371420;9781737749769
Peatland classification provides valuable information for greenhouse gas inventory and biodiversity protection. In this paper, we proposed an encoder-decoder-based architecture for peatland classification that fuses two open-source satellite data, Sentinel-1 and Sentinel-2. We show the effect of fusion by comparing the multi-modal fusion architecture with uni-modals which are trained only based on one input data source. We also investigate the influence of skip connections as the main component of the encoder-decoder to recover fine-grained details that are lost during the downsampling process. The experimental results are acquired on a study area in Finland which covers a variety minerotrophic aapa mire peatlands. The results demonstrate that multi-modal architecture consistently outperforms uni-modal architectures for peatland classification. In addition, the fusion architecture with one skip connection achieved a total accuracy of 57.44%. This shows 8.51% accuracy improvement compared with the model without skip connections.
Convolutional neural networks (CNNs) for visual semantic segmentation have been attracting considerable attention recently because of their superior support for many significant tasks, such as autonomous driving, sema...
详细信息
ISBN:
(纸本)9781728107707
Convolutional neural networks (CNNs) for visual semantic segmentation have been attracting considerable attention recently because of their superior support for many significant tasks, such as autonomous driving, semantic SLAM (simultaneous localization and mapping) and remote sensing surveying and mapping. These kinds of applications generally need to he implemented on the smart terminals, which means that a kind of hardware platform with high energy efficiency and real-time performance is required. However, CNNs for semantic segmentation usually contain sonic, symmetrical encoders and decoders, corresponding to the down-sampling process (e.g., pooling, convolution) and the up-sampling process (e.g., unpooling, deconvolution). All of these processes are computing and storage intensive, which limits their applicability in the resource constrained embedded systems. In this paper, an FPGA-based accelerator programed by OpenCL is proposed. We evaluate its performance on the CamVid dataset. The global accuracy only drops by 2.04% with 8-bit quantization. Additionally, the system shows 48.89 GOPS and 2.4x real-time performance against CPU when running on an Arria-10 GX1150 device.
Neural networks have made significant achievements in the field of image restoration. To efficiently repair facial images with large areas damaged, a decoder-encoder structured convolutional neural network is used as ...
详细信息
ISBN:
(纸本)9789881563903
Neural networks have made significant achievements in the field of image restoration. To efficiently repair facial images with large areas damaged, a decoder-encoder structured convolutional neural network is used as a generative model and skip-connection is added between some of its layers to enhance the structure prediction ability of the generated model and well suppressed the problem that the repair network is easy to over-fitting. The global discrimination network mostly uses the image's edge structure and feature information to ensure that the repaired image, which is the output from the repair network, conforms to visual connectivity, while the local discriminators, not only recognize local consistency but also optimize more details. The network structure proposed in this paper combines the encoder-decoder, skip-connection, and dual discriminator networks to improve the effect of face completion. The experimental results on the CelebA show that the proposed method is superior to other methods in repairing images with large areas of damage.
暂无评论