Inspecting shipping containers using X-ray imagery is critical to safekeeping our borders. One of major tasks of inspecting shipping containers is manifest verification, which has two components: 1) determine what car...
详细信息
ISBN:
(纸本)9781510635869
Inspecting shipping containers using X-ray imagery is critical to safekeeping our borders. One of major tasks of inspecting shipping containers is manifest verification, which has two components: 1) determine what cargos are contained in a shipping container, which can be carried out in cargo segmentation, and 2) compare the cargos in the container with the cargos declared in the manifest. We focus our study on cargo segmentation. Cargo segmentation is the process of partitioning the cargo inside the container into regions with similar appearance. Assign a cargo class label to each pixel in the X-ray images. Our contribution is the development of a deep learning neural net based cargo segmentation algorithm that significantly improves the traditional ways of performing cargo segmentation. The cargo segmentation process is implemented by first partitioning the X-ray images into image tiles of certain sizes, and then train a deep learning (DL) model-based semantic segmentation algorithms using the annotated image tiles to partition the cargo into regions of similar appearance. The DL based semantic segmentation algorithm we used is an encoder-decoder structure often used for semantic segmentation. The DL network implementation chosen for our cargo segmentation is DeepLab v3+, which includes the atrous separable convolution composed of a depthwise convolution and pointwise convolution. Our X-ray cargo images used for development is a government-provided data set (GPD).
Tongue crack segmentation is an essential component of computer-aided diagnosis applied in Traditional Chinese Medicine (TCM). However, existing methods are inadequate when dealing with the vague boundary of the foreg...
详细信息
ISBN:
(纸本)9781728162157
Tongue crack segmentation is an essential component of computer-aided diagnosis applied in Traditional Chinese Medicine (TCM). However, existing methods are inadequate when dealing with the vague boundary of the foreground and the variation of tongue images. To this end, we propose a P-shaped neural network architecture based on the lightweight encoder-decoder structure: the encoder transforms pixel position information into channel information by aggregating adjacent pixel values;the decoder restores the image size and obtains the refined pixel-level extraction results by integrating the information of the corresponding layer in the encoder. To further improve the utilization of network parameters and the model's generalization ability, we design three novel sub-modules: (1) the phantom module utilizes cheap operations to generate feature maps, speeding up the calculation;(2) the dual-input module increases the original input information to enhance the model's foreground understanding;(3) the dual attention gate module strengthens the information fusion of high-level and low-level feature maps, retaining good boundary information while capturing detail information. Additionally, we propose a pre-training method based on cropped patch images, which makes the model sensitive to details of the foreground before formal training. We demonstrate the model's effectiveness on our constructed dataset, achieving 60.6% IoU accuracy, and the segmentation of a 513x513 image takes 390 ms on CPU. And our dataset is available at https://***/pengjianqiang/FDU-TC.
Brain tumors are the most aggressive and mortal cancers, which lead to short life expectancy. A reliable and efficient automatic or semi-automatic segmentation method is significant for clinical practice. In recent ye...
详细信息
Brain tumors are the most aggressive and mortal cancers, which lead to short life expectancy. A reliable and efficient automatic or semi-automatic segmentation method is significant for clinical practice. In recent years, deep learning-based methods achieve great success in brain tumor segmentation. However, due to the limitation of parameters and computational complexity, there is still much room for improvement in these methods. In this paper, we propose an efficient 3D residual neural network (ERV-Net) for brain tumor segmentation, which has less computational complexity and GPU memory consumption. In ERV-Net, a computation-efficient network, 3D ShuffleNetV2, is firstly utilized as encoder to reduce GPU memory and improve the efficiency of ERV-Net, and then the decoder with residual blocks (Res-decoder) is introduced to avoid degradation. Furthermore, a fusion loss function, which is composed of Dice loss and Cross-entropy loss, is developed to solve the problems of network convergence and data imbalance. Moreover, a concise and effective post-processing method is proposed to refine the coarse segmentation result of ERV-Net. The experimental results on the dataset of multimodal brain tumor segmentation challenge 2018 (BRATS 2018) demonstrate that ERV-Net achieves the best performance with Dice of 81.8%, 91.21% and 86.62% and Hausdorff distance of 2.70 mm, 3.88 mm and 6.79 mm for enhancing tumor, whole tumor and tumor core, respectively. Besides, ERV-Net also achieves high efficiency compared to the state-of-the-art methods.
Thermal wave imaging is a nondestructive testing (NDT) technology widely used to detect defects for various materials. It is important for quality control purposes to be able to clearly define the sizes of the defecti...
详细信息
Thermal wave imaging is a nondestructive testing (NDT) technology widely used to detect defects for various materials. It is important for quality control purposes to be able to clearly define the sizes of the defective areas. Due to the diffusive nature of thermal waves the acquired images contain varying degrees of blur depending on the depth of the defects, which severely affects the ability to define the defects. Conventional edge enhancement algorithms are hardly to achieve desirable results. Using deep convolutional neural network, we designed a deep residual network based on an encoder-decoder structure. Through the depth residual and skip-connection structures, we can effectively solve the vanishing gradient problem and improve the ability of feature extraction. The experimental results demonstrate that the proposed method shows superior performance over conventional image enhancement algorithms by providing richer information with higher contrast and more details.
Camera shake and target movement often leads to undesirable image blurring in videos. How to exploit spatial-temporal information of adjacent frames and reduce the processing time of deblurring are two major issues in...
详细信息
ISBN:
(纸本)9781509066315
Camera shake and target movement often leads to undesirable image blurring in videos. How to exploit spatial-temporal information of adjacent frames and reduce the processing time of deblurring are two major issues in video deblurring. In this paper, we propose a simple yet effective Fourier accumulation embedded 3D convolutional encoder-decoder network for video deblurring. Firstly, a 3D convolutional encoder-decoder module is constructed to extract multiscale spatial-temporal deep features and generate intermediate de-blurred frames with complementary information which is beneficial for the deblurring of each frame. Then we embed a Fourier accumulation module following the 3D convolutional encoder-decoder, the Fourier accumulation module could fuse intermediate deblurred frames with learned weights in Fourier domain and then produce shaper deblurred frames. Experimental results show that our method has competitive performance compared with other state-of-the-art methods.
In this paper, we propose an end to end solution for image matting i.e high-precision extraction of foreground objects from natural images. Image matting and background detection can be achieved easily through chroma ...
详细信息
ISBN:
(纸本)9781728163710
In this paper, we propose an end to end solution for image matting i.e high-precision extraction of foreground objects from natural images. Image matting and background detection can be achieved easily through chroma keying in a studio setting when the background is either pure green or blue. Nonetheless, image matting in natural scenes with complex and uneven depth backgrounds remains a tedious task that requires human intervention. To achieve complete automatic foreground extraction in natural scenes, we propose a method that assimilates semantic segmentation and deep image matting processes into a single network to generate detailed semantic mattes for image composition task. The contribution of our proposed method is two-fold, firstly it can be interpreted as a fully automated semantic image matting method and secondly as a refinement of existing semantic segmentation models. We propose a novel model architecture as a combination of segmentation and matting that unifies the function of upsampling and downsampling operators with the notion of attention. As shown in our work, attention guided downsampling and upsampling can extract high-quality boundary details, unlike other normal downsampling and upsampling techniques. For achieving the same, we utilized an attention guided encoder-decoder framework which does unsupervised learning for generating an attention map adaptively from the data to serve and direct the upsampling and downsampling operators. We also construct a fashion e-commerce focused dataset with high-quality alpha mattes to facilitate the training and evaluation for image matting.
Trend driven retail industries such as fashion, launch substantial new products every season. In such a scenario, an accurate demand forecast for these newly launched products is vital for efficient downstream supply ...
详细信息
ISBN:
(纸本)9781450379984
Trend driven retail industries such as fashion, launch substantial new products every season. In such a scenario, an accurate demand forecast for these newly launched products is vital for efficient downstream supply chain planning like assortment planning and stock allocation. While classical time-series forecasting algorithms can be used for existing products to forecast the sales, new products do not have any historical time-series data to base the forecast on. In this paper, we propose and empirically evaluate several novel attention-based multi-modal encoder-decoder models to forecast the sales for a new product purely based on product images, any available product attributes and also external factors like holidays, events, weather, and discount. We experimentally validate our approaches on a large fashion dataset and report the improvements in achieved accuracy and enhanced model interpretability as compared to existing k-nearest neighbor based baseline approaches.
Long acquisition time (AQT) due to series acquisition of multi-modality MR images (especially T2 weighted images (T2WI) with longer AQT), though beneficial for disease diagnosis, is practically undesirable. We propose...
详细信息
ISBN:
(纸本)9781728194295
Long acquisition time (AQT) due to series acquisition of multi-modality MR images (especially T2 weighted images (T2WI) with longer AQT), though beneficial for disease diagnosis, is practically undesirable. We propose a novel deep network based solution to reconstruct T2W images from T1W images (T1WI) using an encoder-decoder architecture. The proposed learning is aided with semantic features by using multi-channel input with intensity values and gradient of image in two orthogonal directions. A reconstruction module (RM) augmenting the network along with a domain adaptation module (DAM) which is an encoder-decoder model built-in with sharp bottleneck module (SBM) is trained via modular training. The proposed network significantly reduces the total AQT with negligible qualitative artifacts and quantitative loss (reconstructs one volume in similar to 1 second). The testing is done on publicly available dataset with real MR images, and the proposed network shows (similar to 1dB) increase in PSNR over SOTA.
Modern neural text-to-speech (TTS) synthesis can generate speech that is indistinguishable from natural speech. However, the prosody of generated utterances often represents the average prosodic style of the database ...
详细信息
ISBN:
(纸本)9781713820697
Modern neural text-to-speech (TTS) synthesis can generate speech that is indistinguishable from natural speech. However, the prosody of generated utterances often represents the average prosodic style of the database instead of having wide prosodic variation. Moreover, the generated prosody is solely defined by the input text, which does not allow for different styles for the same sentence. In this work, we train a sequence-to-sequence neural network conditioned on acoustic speech features to learn a latent prosody space with intuitive and meaningful dimensions. Experiments show that a model conditioned on sentence wise pitch, pitch range, phone duration, energy, and spectral tilt can effectively control each prosodic dimension and generate a wide variety of speaking styles, while maintaining similar mean opinion score (4:23) to our Tacotron baseline (4:26).
Machine Translation (MT) refers to the automated software-based translation of natural language text. The embedded complexities and incompatibilities of natural languages have made MT a daunting task facing numerous c...
详细信息
ISBN:
(纸本)9781728110516
Machine Translation (MT) refers to the automated software-based translation of natural language text. The embedded complexities and incompatibilities of natural languages have made MT a daunting task facing numerous challenges, especially when it is to be compared to a manual translation. With the emergence of deep-learning AI approaches, the Neural Machine Translation (NMT) has pushed MT results closer to human expectations. One of the newest deep learning approaches is the sequence-to-sequence approach based on Recurrent Neural Networks (RNN), complex convolutions, and transformers, and employing encoders/decoder pairs. In this study, an attention-based deep learning architecture is proposed for MT, with all layers focused exclusively on multi-head attention and based on a transformer that includes multi-layer encoders/decoders. The main contributions of the proposed model lie in the weighted combination of layers' primary input and output of the previous layers, feeding into the next layer. This mechanism results in a more accurate transformation compared to non-hybrid inputs. The model is evaluated using two datasets for German/English translation, the WMT'14 dataset for training, and the newstest' 2012 dataset for testing. The experiments are run on GPU-equipped Google Colab instances and the results show an accuracy of 36.7 BLEU, a 5% improvement over the previous work without the hybrid-input technique.
暂无评论