检索结果-内蒙古大学图书馆

4th International Workshop on Brainlesion - Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries (BrainLes)

作者： Hu, Yan Liu, Xiang Wen, Xin Niu, Chen Xia, Yong Northwestern Polytech Univ Sch Comp Sci & Engn Natl Engn Lab Integrated AeroSp Ground Ocean Big Xian 710072 Peoples R China Xi An Jiao Tong Univ Affiliated Hosp 1 Xian 710061 Peoples R China

ISBN: (纸本)9783030117269;9783030117252

Accurate brain tumor segmentation plays a pivotal role in clinical practice and research settings. In this paper, we propose the multi-level upsampling network (MU-Net) to learn the image presentations of transverse, sagittal and coronal view and fuse them to automatically segment brain tumors, including necrosis, edema, non-enhancing, and enhancing tumor, in multimodal magnetic resonance (MR) sequences. The MU-Net model has an encoder-decoder structure, in which low level feature maps obtained by the encoder and high level feature maps obtained by the decoder are combined by using a newly designed global attention (GA) module. The proposed model has been evaluated on the BraTS 2018 Challenge validation dataset and achieved an average Dice similarity coefficient of 0.88, 0.74, 0.69 and 0.85, 0.72, 0.66 for the whole tumor, core tumor and enhancing tumor on the validation dataset and testing dataset, respectively. Our results indicate that the proposed model has a promising performance in automated brain tumor segmentation.

关键词： Magnetic resonance imaging Brain tumor segmentation encoder-decoder Multi-level upsampling Global attention

来源：评论

学校读者我要写书评

暂无评论

A New Visual Question Answering System for Medical images characterization 19

A New Visual Question Answering System for Medical images ch...

引用

4th International Conference on Smart City Applications (SCA)

作者： Bghiel, Afrae Dahdouh, Yousra Allaouzi, Imane Ben Ahmed, Mohamed Anouar Boudhir, Abdelhakim UAE LIST Lab FSTT Tangier Morocco

ISBN: (纸本)9781450362894

This article presents our proposed system combining a Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) for the visual question answering applied in the medical images characterization. In our proposed encoder-decoder Model we have used a pre-trained convolutional neural network to extract image features, a pre-trained word embedding for questions-answers representation.

关键词： Computer Vision CNN RNN NLP Transfer Learning encoder-decoder LSTM Word Embedding Visual Question Answering Greedy Search

来源：评论

学校读者我要写书评

暂无评论

SEMI-SUPERVISED END-TO-END SPEECH RECOGNITION USING TEXT-TO-SPEECH AND AUTOencoderS 44

SEMI-SUPERVISED END-TO-END SPEECH RECOGNITION USING TEXT-TO-...

引用

44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

作者： Karita, Shigeki Watanabe, Shinji Iwata, Tomoharu Delcroix, Marc Ogawa, Atsunori Nakatani, Tomohiro NTT Commun Sci Labs Kyoto Japan Johns Hopkins Univ Ctr Language & Speech Proc Baltimore MD 21218 USA

ISBN: (纸本)9781479981311

We introduce speech and text autoencoders that share encoders and decoders with an automatic speech recognition ( ASR) model to improve ASR performance with large speech only and text only training datasets. To build the speech and text autoencoders, we leverage state-of-the-art ASR and text-to-speech ( TTS) encoder decoder architectures. These autoencoders learn features from speech only and text only datasets by switching the encoders and decoders used in the ASR and TTS models. Simultaneously, they aim to encode features to be compatible with ASR and TTS models by a multi-task loss. Additionally, we anticipate that TTS joint training can also improve the ASR performance because both ASR and TTS models learn transformations between speech and text. The experimental result we obtained with our semi-supervised end-to-end ASR/TTS training revealed reductions from a model initially trained with a small paired subset of the LibriSpeech corpus in the character error rate from 10.4% to 8.4% and word error rate from 20.6% to 18.0% by retraining the model with a large unpaired subset of the corpus.

关键词： speech recognition semi-supervised learning autoencoder encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Advancing sequence-to-sequence based speech recognition 20

Advancing sequence-to-sequence based speech recognition

引用

Interspeech Conference

作者： Tuske, Zoltan Audhkhasi, Kartik Saon, George IBM Res AI Yorktown Hts NY 10598 USA

The paper presents our endeavor to improve state-of-the-art speech recognition results using attention based neural network approaches. Our test focus was LibriSpeech, a well-known, publicly available, large, speech corpus, but the methodologies are clearly applicable to other tasks. After systematic application of standard techniques - sophisticated data augmentation, various dropout schemes, scheduled sampling, warm-restart -, and optimizing search configurations, our model achieves 4.0% and 11.7% word error rate (WER) on the test-clean and test-other sets, without any external language model. A powerful recurrent language model drops the error rate further to 2.7% and 8.2%. Thus, we not only report the lowest sequence-to-sequence model based numbers on this task to date, but our single system even challenges the best result known in the literature, namely a hybrid model together with recurrent language model rescoring. A simple ROVER combination of several of our attention based systems achieved 2.5% and 7.3% WER on the clean and other test sets.

关键词： encoder-decoder attention speech recognition LibriSpeech

来源：评论

学校读者我要写书评

暂无评论

Incorporating Textual Similarity in Video Captioning Schemes 25

Incorporating Textual Similarity in Video Captioning Schemes

引用

25th IEEE International Conference on Engineering, Technology and Innovation / 25th ICE/IEEE International Technology Management Conference (ITMC)

作者： Gkountakos, Konstantinos Dimou, Anastasios Papadopoulos, Georgios Th. Daras, Petros Ctr Res & Technol Hellas Inst Informat Technol Thessaloniki Greece

ISBN: (纸本)9781728134017

The problem of video captioning has been heavily investigated from the research community the last years and, especially, since Recurrent Neural Networks (RNNs) have been introduced. Aforementioned approaches of video captioning, are usually based on sequence-to-sequence models that aim to exploit the visual information by detecting events, objects, or via matching entities to words. However, the exploitation of the contextual information that can be extracted from the vocabulary has not been investigated yet, except from approaches that make use of parts of speech such as verbs, nouns, and adjectives. The proposed approach is based on the assumption that textually similar captions should represent similar visual content. Specifically, we propose a novel loss function that penalizes/rewards the wrong/correct predicted words based on the semantic cluster that they belong to. The proposed method is evaluated using two widely-known datasets in the video captioning domain, Microsoft Research - Video to Text (MSR-VTT) and Microsoft Research Video Description Corpus (MSVD). Finally, experimental analysis proves that the proposed method outperforms the baseline approach in most cases.

关键词： video captioning Word2Vec textual information encoder-decoder Recurrent Neural Network (RNN)

来源：评论

学校读者我要写书评

暂无评论

Building a mixed-lingual neural TTS system with only monolingual data 20

Building a mixed-lingual neural TTS system with only monolin...

引用

Interspeech Conference

作者： Xue, Liumeng Song, Wei Xu, Guanghui Xie, Lei Wu, Zhizheng Northwestern Polytech Univ Sch Comp Sci Shaanxi Prov Key Lab Speech & Image Informat Proc Xian Peoples R China JDcom Beijing Peoples R China JDcom Santa Clara CA USA

When deploying a Chinese neural Text-to-Speech (TTS) system, one of the challenges is to synthesize Chinese utterances with English phrases or words embedded. This paper looks into the problem in the encoder-decoder framework when only monolingual data from a target speaker is available. Specifically, we view the problem from two aspects: speaker consistency within an utterance and naturalness. We start the investigation with an average voice model which is built from multispeaker monolingual data, i.e., Mandarin and English data. On the basis of that, we look into speaker embedding for speaker consistency within an utterance and phoneme embedding for naturalness and intelligibility, and study the choice of data for model training. We report the findings and discuss the challenges to build a mixed-lingual TTS system with only monolingual data.

关键词： speech synthesis encoder-decoder mixed-lingual

来源：评论

学校读者我要写书评

暂无评论

Exact-K Recommendation via Maximal Clique Optimization 19

Exact-K Recommendation via Maximal Clique Optimization

引用

25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD)

作者： Gong, Yu Zhu, Yu Duan, Lu Liu, Qingwen Guan, Ziyu Sun, Fei Ou, Wenwu Zhu, Kenny Q. Alibaba Grp Hangzhou Zhejiang Peoples R China Zhejiang Cainiao Supply Chain Management Co Ltd Hangzhou Zhejiang Peoples R China Xidian Univ Xian Shaanxi Peoples R China Shanghai Jiao Tong Univ Shanghai Peoples R China

ISBN: (纸本)9781450362016

This paper targets to a novel but practical recommendation problem named exact-K recommendation. It is different from traditional top-K recommendation, as it focuses more on (constrained) combinatorial optimization which will optimize to recommend a whole set of K items called card, rather than ranking optimization which assumes that "better" items should be put into top positions. Thus we take the first step to give a formal problem definition, and innovatively reduce it to Maximum Clique Optimization based on graph. To tackle this specific combinatorial optimization problem which is NP-hard, we propose Graph Attention Networks (GAttN) with a Multi-head Self-attention encoder and a decoder with attention mechanism. It can end-to-end learn the joint distribution of the K items and generate an optimal card rather than rank individual items by prediction scores. Then we propose Reinforcement Learning from Demonstrations (RLfD) which combines the advantages in behavior cloning and reinforcement learning, making it sufficient and -efficient to train the model. Extensive experiments on three datasets demonstrate the effectiveness of our proposed GAttN with RLfD method, it outperforms several strong baselines with a relative improvement of 7.7% and 4.7% on average in Precision and Hit Ratio respectively, and achieves state-of-the-art (SOTA) performance for the exact-K recommendation problem.

关键词： recommender system exact-K recommendation learning-to-rank reinforcement learning encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Framework of Sequence Chunking for Human Activity Recognition Using Wearables 2019

Framework of Sequence Chunking for Human Activity Recognitio...

引用

International Conference on Image, Video and Signal Processing (IVSP)

作者： Zhang, Weijia Qin, Le Zhong, Wei Guo, Xuemei Wang, Guoli Sun Yat Sen Univ Guangzhou Guangdong Peoples R China

ISBN: (纸本)9781450361750

Human activity recognition (HAR) is the main research area in ubiquitous computing, and most of existing approaches are based on the frameworks of sliding window segmentation and dense labeling. However, existing frameworks have some problems. For example, sliding window segmentation will cause the problem of label inconsistency, and dense labeling cannot model relationship between activities explicitly. In our paper, we propose a new framework to deal with the problems caused by these frameworks, in which HAR is treated as a sequence chunking problem and divided into the subtasks of segmentation and labeling. The purpose of the segmentation is to segment a raw sequence into different chunks that represent the corresponding activities respectively, and labeling is used to predict the corresponding label for each chunk based on segmentation results. We propose an encoder-decoder model based on convolutional neural networks to implement the proposed framework. The encoder segments a sequence to chunks based on BIO labels, and the decoder treats a chunk as a basic unit to predict the corresponding label. We conduct experiments and show that the proposed model achieves the state-of-the-art performance on both Opportunity and Hand Gesture datasets.

关键词： Human Activity Recognition CNN encoder-decoder Sequence Chunking

来源：评论

学校读者我要写书评

暂无评论

Coastal Land Cover Classification of High-Resolution Remote Sensing Images Using Attention-Driven Context Encoding Network

引用

SENSORS 2020年第24期20卷 7032-7032页

作者： Chen, Jifa Chen, Gang Wang, Lizhe Fang, Bo Zhou, Ping Zhu, Mingjie China Univ Geosci Coll Marine Sci & Technol Wuhan 430074 Peoples R China China Univ Geosci Key Lab Geol Survey & Evaluat Minist Educ Wuhan 430074 Peoples R China China Univ Geosci Sch Comp Sci Wuhan 430074 Peoples R China

Low inter-class variance and complex spatial details exist in ground objects of the coastal zone, which leads to a challenging task for coastal land cover classification (CLCC) from high-resolution remote sensing images. Recently, fully convolutional neural networks have been widely used in CLCC. However, the inherent structure of the convolutional operator limits the receptive field, resulting in capturing the local context. Additionally, complex decoders bring additional information redundancy and computational burden. Therefore, this paper proposes a novel attention-driven context encoding network to solve these problems. Among them, lightweight global feature attention modules are employed to aggregate multi-scale spatial details in the decoding stage. Meanwhile, position and channel attention modules with long-range dependencies are embedded to enhance feature representations of specific categories by capturing the multi-dimensional global context. Additionally, multiple objective functions are introduced to supervise and optimize feature information at specific scales. We apply the proposed method in CLCC tasks of two study areas and compare it with other state-of-the-art approaches. Experimental results indicate that the proposed method achieves the optimal performances in encoding long-range context and recognizing spatial details and obtains the optimum representations in evaluation indexes.

关键词： coastal zone land cover classification semantic segmentation encoder-decoder context encoding attention mechanism

来源：评论

学校读者我要写书评

暂无评论

IMPROVING GRAPHEME-TO-PHONEME CONVERSION BY INVESTIGATING COPYING MECHANISM IN RECURRENT ARCHITECTURES

IMPROVING GRAPHEME-TO-PHONEME CONVERSION BY INVESTIGATING CO...

引用

IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

作者： Niranjan, Abhishek Shaik, M. Ali Basha Samsung Res & Dev Inst Voice Intelligence Bangalore Karnataka India

ISBN: (纸本)9781728103068

Attention driven encoder-decoder architectures have become highly successful in various sequence-to-sequence learning tasks. We propose copy-augmented Bi-directional Long Short-Term Memory based encoder-decoder architecture for the Grapheme-to-Phoneme conversion. In Grapheme-to-Phoneme task, a number of character units in words possess high degree of similarity with some phoneme unit(s). Thus, we make an attempt to capture this characteristic using copy-augmented architecture. Our proposed model automatically learns to generate phoneme sequences during inference by copying source token embeddings to the decoder's output in a controlled manner. To our knowledge, this is the first time the copy-augmentation is being investigated for Grapheme-to-Phoneme conversion task. We validate our experiments over accented and non-accented publicly available CMU-Dict datasets and achieve State-of-The-Art performances in terms of both phoneme and word error rates. Further, we verify the applicability of our proposed approach on Hindi Lexicon and show that our model outperforms all recent State-of-The-Art results.

关键词： Grapheme-to-Phoneme Copy augmentation encoder-decoder attention

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：