检索结果-内蒙古大学图书馆

Conference on Anomaly Detection and Imaging with X-Rays (ADIX) V

作者： Shen, Weicheng Tuszynski, Jaroslaw Leidos Inc Leidos Innovat Ctr Reston VA 20190 USA

ISBN: (纸本)9781510635869

Inspecting shipping containers using X-ray imagery is critical to safekeeping our borders. One of major tasks of inspecting shipping containers is manifest verification, which has two components: 1) determine what cargos are contained in a shipping container, which can be carried out in cargo segmentation, and 2) compare the cargos in the container with the cargos declared in the manifest. We focus our study on cargo segmentation. Cargo segmentation is the process of partitioning the cargo inside the container into regions with similar appearance. Assign a cargo class label to each pixel in the X-ray images. Our contribution is the development of a deep learning neural net based cargo segmentation algorithm that significantly improves the traditional ways of performing cargo segmentation. The cargo segmentation process is implemented by first partitioning the X-ray images into image tiles of certain sizes, and then train a deep learning (DL) model-based semantic segmentation algorithms using the annotated image tiles to partition the cargo into regions of similar appearance. The DL based semantic segmentation algorithm we used is an encoder-decoder structure often used for semantic segmentation. The DL network implementation chosen for our cargo segmentation is DeepLab v3+, which includes the atrous separable convolution composed of a depthwise convolution and pointwise convolution. Our X-ray cargo images used for development is a government-provided data set (GPD).

关键词： X-ray image cargo container deep learning semantic segmentation encoder-decoder atrous convolution

来源：评论

学校读者我要写书评

暂无评论

Automatic Tongue Crack Extraction For Real-Time Diagnosis

Automatic Tongue Crack Extraction For Real-Time Diagnosis

引用

IEEE International Conference on Bioinformatics and Biomedicine (IEEE BIBM)

作者： Peng, Jianqiang Li, Xinlei Yang, Dawei Zhang, Yingtao Zhang, Wei Zhang, Ye Kong, Yajie Li, Fufeng Zhang, Wenqiang Fudan Univ Acad Engn & Technol Shanghai Peoples R China Fudan Univ Shanghai Key Lab Intelligent Informat Proc Shanghai Peoples R China Chinese Acad Sci Changchun Inst Opt Fine Mech & Phys Changchun Peoples R China Shanghai Univ TCM Lab TCM Four Proc Shanghai Peoples R China

ISBN: (纸本)9781728162157

Tongue crack segmentation is an essential component of computer-aided diagnosis applied in Traditional Chinese Medicine (TCM). However, existing methods are inadequate when dealing with the vague boundary of the foreground and the variation of tongue images. To this end, we propose a P-shaped neural network architecture based on the lightweight encoder-decoder structure: the encoder transforms pixel position information into channel information by aggregating adjacent pixel values;the decoder restores the image size and obtains the refined pixel-level extraction results by integrating the information of the corresponding layer in the encoder. To further improve the utilization of network parameters and the model's generalization ability, we design three novel sub-modules: (1) the phantom module utilizes cheap operations to generate feature maps, speeding up the calculation;(2) the dual-input module increases the original input information to enhance the model's foreground understanding;(3) the dual attention gate module strengthens the information fusion of high-level and low-level feature maps, retaining good boundary information while capturing detail information. Additionally, we propose a pre-training method based on cropped patch images, which makes the model sensitive to details of the foreground before formal training. We demonstrate the model's effectiveness on our constructed dataset, achieving 60.6% IoU accuracy, and the segmentation of a 513x513 image takes 390 ms on CPU. And our dataset is available at https://***/pengjianqiang/FDU-TC.

关键词： Tongue Crack Segmentation Neural Network encoder-decoder Lightweight Attention

来源：评论

学校读者我要写书评

暂无评论

ERV-Net: An efficient 3D residual neural network for brain tumor segmentation

引用

EXPERT SYSTEMS WITH APPLICATIONS 2021年 170卷 114566-114566页

作者： Zhou, Xinyu Li, Xuanya Hu, Kai Zhang, Yuan Chen, Zhineng Gao, Xieping Xiangtan Univ Minist Educ Key Lab Intelligent Comp & Informat Proc Xiangtan 411105 Peoples R China Baidu Inc Beijing 100085 Peoples R China Chinese Acad Sci Inst Automat Beijing 100190 Peoples R China Xiangnan Univ Coll Med Imaging & Inspect Chenzhou Peoples R China

Brain tumors are the most aggressive and mortal cancers, which lead to short life expectancy. A reliable and efficient automatic or semi-automatic segmentation method is significant for clinical practice. In recent years, deep learning-based methods achieve great success in brain tumor segmentation. However, due to the limitation of parameters and computational complexity, there is still much room for improvement in these methods. In this paper, we propose an efficient 3D residual neural network (ERV-Net) for brain tumor segmentation, which has less computational complexity and GPU memory consumption. In ERV-Net, a computation-efficient network, 3D ShuffleNetV2, is firstly utilized as encoder to reduce GPU memory and improve the efficiency of ERV-Net, and then the decoder with residual blocks (Res-decoder) is introduced to avoid degradation. Furthermore, a fusion loss function, which is composed of Dice loss and Cross-entropy loss, is developed to solve the problems of network convergence and data imbalance. Moreover, a concise and effective post-processing method is proposed to refine the coarse segmentation result of ERV-Net. The experimental results on the dataset of multimodal brain tumor segmentation challenge 2018 (BRATS 2018) demonstrate that ERV-Net achieves the best performance with Dice of 81.8%, 91.21% and 86.62% and Hausdorff distance of 2.70 mm, 3.88 mm and 6.79 mm for enhancing tumor, whole tumor and tumor core, respectively. Besides, ERV-Net also achieves high efficiency compared to the state-of-the-art methods.

关键词： Brain tumor segmentation 3D convolutional neural network encoder-decoder Efficiency Lightweight Residual block

来源：评论

学校读者我要写书评

暂无评论

Thermal wave image deblurring based on depth residual network

引用

INFRARED PHYSICS & TECHNOLOGY 2021年 117卷 103847-103847页

作者： Jiang, Haijun Chen, Fei Liu, Xining Chen, Jesse Zhang, Kai Li Chen Univ Elect Sci & Technol 2006 Xiyuan Blvd Chengdu 611731 Peoples R China Nanjing Univ Sci & Technol 200 Xiaolingwei St Nanjing 210094 Peoples R China Novelteq Ltd 6 Xingzhi Rd Nanjing 210046 Peoples R China

Thermal wave imaging is a nondestructive testing (NDT) technology widely used to detect defects for various materials. It is important for quality control purposes to be able to clearly define the sizes of the defective areas. Due to the diffusive nature of thermal waves the acquired images contain varying degrees of blur depending on the depth of the defects, which severely affects the ability to define the defects. Conventional edge enhancement algorithms are hardly to achieve desirable results. Using deep convolutional neural network, we designed a deep residual network based on an encoder-decoder structure. Through the depth residual and skip-connection structures, we can effectively solve the vanishing gradient problem and improve the ability of feature extraction. The experimental results demonstrate that the proposed method shows superior performance over conventional image enhancement algorithms by providing richer information with higher contrast and more details.

关键词： Infrared thermography Thermal wave image Image deblurring Depth residual network encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

VIDEO DEBLURRING VIA 3D CNN AND FOURIER ACCUMULATION LEARNING

VIDEO DEBLURRING VIA 3D CNN AND FOURIER ACCUMULATION LEARNIN...

引用

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Yang, Fan Xiao, Liang Yang, Jingxiang Nanjing Univ Sci & Technol Sch Comp Sci & Engn Nanjing Peoples R China

ISBN: (纸本)9781509066315

Camera shake and target movement often leads to undesirable image blurring in videos. How to exploit spatial-temporal information of adjacent frames and reduce the processing time of deblurring are two major issues in video deblurring. In this paper, we propose a simple yet effective Fourier accumulation embedded 3D convolutional encoder-decoder network for video deblurring. Firstly, a 3D convolutional encoder-decoder module is constructed to extract multiscale spatial-temporal deep features and generate intermediate de-blurred frames with complementary information which is beneficial for the deblurring of each frame. Then we embed a Fourier accumulation module following the 3D convolutional encoder-decoder, the Fourier accumulation module could fuse intermediate deblurred frames with learned weights in Fourier domain and then produce shaper deblurred frames. Experimental results show that our method has competitive performance compared with other state-of-the-art methods.

关键词： video deblurring encoder-decoder 3D convolution Fourier accumulation

来源：评论

学校读者我要写书评

暂无评论

AlphaNet: An Attention Guided Deep Network for Automatic Image Matting

AlphaNet: An Attention Guided Deep Network for Automatic Ima...

引用

2nd IEEE International Conference on Omni-Iayer Intelligent Systems (IEEE COINS)

作者： Sharma, Rishab Deora, Rahul Vishvakarma, Anirudha Fynd Mumbai Maharashtra India

ISBN: (纸本)9781728163710

In this paper, we propose an end to end solution for image matting i.e high-precision extraction of foreground objects from natural images. Image matting and background detection can be achieved easily through chroma keying in a studio setting when the background is either pure green or blue. Nonetheless, image matting in natural scenes with complex and uneven depth backgrounds remains a tedious task that requires human intervention. To achieve complete automatic foreground extraction in natural scenes, we propose a method that assimilates semantic segmentation and deep image matting processes into a single network to generate detailed semantic mattes for image composition task. The contribution of our proposed method is two-fold, firstly it can be interpreted as a fully automated semantic image matting method and secondly as a refinement of existing semantic segmentation models. We propose a novel model architecture as a combination of segmentation and matting that unifies the function of upsampling and downsampling operators with the notion of attention. As shown in our work, attention guided downsampling and upsampling can extract high-quality boundary details, unlike other normal downsampling and upsampling techniques. For achieving the same, we utilized an attention guided encoder-decoder framework which does unsupervised learning for generating an attention map adaptively from the data to serve and direct the upsampling and downsampling operators. We also construct a fashion e-commerce focused dataset with high-quality alpha mattes to facilitate the training and evaluation for image matting.

关键词： matting background removal encoder-decoder fully-automated trimap image matting deep image matting

来源：评论

学校读者我要写书评

暂无评论

Attention based Multi-Modal New Product Sales Time-series Forecasting 20

Attention based Multi-Modal New Product Sales Time-series Fo...

引用

26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD)

作者： Ekambaram, Vijay Manglik, Kushagra Mukherjee, Sumanta Sajja, Surya Shravan Kumar Dwivedi, Satyam Raykar, Vikas IBM Res Yorktown Hts NY 10598 USA

ISBN: (纸本)9781450379984

Trend driven retail industries such as fashion, launch substantial new products every season. In such a scenario, an accurate demand forecast for these newly launched products is vital for efficient downstream supply chain planning like assortment planning and stock allocation. While classical time-series forecasting algorithms can be used for existing products to forecast the sales, new products do not have any historical time-series data to base the forecast on. In this paper, we propose and empirically evaluate several novel attention-based multi-modal encoder-decoder models to forecast the sales for a new product purely based on product images, any available product attributes and also external factors like holidays, events, weather, and discount. We experimentally validate our approaches on a large fashion dataset and report the improvements in achieved accuracy and enhanced model interpretability as compared to existing k-nearest neighbor based baseline approaches.

关键词： New product sales forecast Image based forecasting Multi-modal embeddings RNNs encoder-decoder Attention

来源：评论

学校读者我要写书评

暂无评论

Semantic Features Aided Multi-Scale Reconstruction of Inter-Modality Magnetic Resonance Images 33

Semantic Features Aided Multi-Scale Reconstruction of Inter-...

引用

33rd IEEE International Symposium on Computer-Based Medical Systems (CBMS)

作者： Srinivasan, Preethi Kaur, Prabhjot Nigam, Aditya Bhavsar, Arnav Indian Inst Technol Sch Comp & Elect Engn Mandi Himachal Prades India

ISBN: (纸本)9781728194295

Long acquisition time (AQT) due to series acquisition of multi-modality MR images (especially T2 weighted images (T2WI) with longer AQT), though beneficial for disease diagnosis, is practically undesirable. We propose a novel deep network based solution to reconstruct T2W images from T1W images (T1WI) using an encoder-decoder architecture. The proposed learning is aided with semantic features by using multi-channel input with intensity values and gradient of image in two orthogonal directions. A reconstruction module (RM) augmenting the network along with a domain adaptation module (DAM) which is an encoder-decoder model built-in with sharp bottleneck module (SBM) is trained via modular training. The proposed network significantly reduces the total AQT with negligible qualitative artifacts and quantitative loss (reconstructs one volume in similar to 1 second). The testing is done on publicly available dataset with real MR images, and the proposed network shows (similar to 1dB) increase in PSNR over SOTA.

关键词： Inter-modality Transformation-Learning encoder-decoder T1-T2 Reconstruction

来源：评论

学校读者我要写书评

暂无评论

Controllable neural text-to-speech synthesis using intuitive prosodic features 21

Controllable neural text-to-speech synthesis using intuitive...

引用

Interspeech Conference

作者： Raitio, Tuomo Rasipuram, Ramya Castellani, Dan Apple Cupertino CA 95014 USA

ISBN: (纸本)9781713820697

Modern neural text-to-speech (TTS) synthesis can generate speech that is indistinguishable from natural speech. However, the prosody of generated utterances often represents the average prosodic style of the database instead of having wide prosodic variation. Moreover, the generated prosody is solely defined by the input text, which does not allow for different styles for the same sentence. In this work, we train a sequence-to-sequence neural network conditioned on acoustic speech features to learn a latent prosody space with intuitive and meaningful dimensions. Experiments show that a model conditioned on sentence wise pitch, pitch range, phone duration, energy, and spectral tilt can effectively control each prosodic dimension and generate a wide variety of speaking styles, while maintaining similar mean opinion score (4:23) to our Tacotron baseline (4:26).

关键词： Prosody control end-to-end neural speech synthesis sequence-to-sequence attention encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Machine Translation Using Improved Attention-based Transformer with Hybrid Input 6

Machine Translation Using Improved Attention-based Transform...

引用

6th International Conference on Web Research (ICWR)

作者： Abrishami, Mahsa Rashti, Mohammad Javad Naderan, Marjan Shahid Chamran Univ Ahvaz Dept Comp Engn Ahvaz Iran

ISBN: (纸本)9781728110516

Machine Translation (MT) refers to the automated software-based translation of natural language text. The embedded complexities and incompatibilities of natural languages have made MT a daunting task facing numerous challenges, especially when it is to be compared to a manual translation. With the emergence of deep-learning AI approaches, the Neural Machine Translation (NMT) has pushed MT results closer to human expectations. One of the newest deep learning approaches is the sequence-to-sequence approach based on Recurrent Neural Networks (RNN), complex convolutions, and transformers, and employing encoders/decoder pairs. In this study, an attention-based deep learning architecture is proposed for MT, with all layers focused exclusively on multi-head attention and based on a transformer that includes multi-layer encoders/decoders. The main contributions of the proposed model lie in the weighted combination of layers' primary input and output of the previous layers, feeding into the next layer. This mechanism results in a more accurate transformation compared to non-hybrid inputs. The model is evaluated using two datasets for German/English translation, the WMT'14 dataset for training, and the newstest' 2012 dataset for testing. The experiments are run on GPU-equipped Google Colab instances and the results show an accuracy of 36.7 BLEU, a 5% improvement over the previous work without the hybrid-input technique.

关键词： Neural Machine Translation Self-Attention Multi-Head Attention encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：