检索结果-内蒙古大学图书馆

Enhance audio-visual segmentation with hierarchical encoder and audio guidance

NEUROCOMPUTING 2024年 594卷

作者： Guo, Cunhan Huang, Heyan Zhou, Yanghao Univ Chinese Acad Sci Sch Emergency Management Sci & Engn 1 Yanqihu East Rd Beijing 101400 Peoples R China Beijing Inst Technol Southeast Acad informat Technol 1998 Licheng Middle Ave Putian 351100 Fujian Peoples R China Beijing Inst Technol Sch Comp Sci & Technol 5 Zhongguancun South St Beijing 101400 Peoples R China

As one of the pivotal technologies leading towards embodied intelligence, audio-visual segmentation is geared towards achieving precise segmentation of sounding objects, offering vast application prospects in scenarios such as emergency rescue and natural exploration. Nevertheless, the performance of audio-visual segmentation technology encounters limitations stemming from challenges related to the adaptation and fusion of crossmodal information encoding, as well as the decoding and generation of masks. To address these issues, this paper explores the adaptation of multi -modal information based on a shared encoder by employing a neural architecture search method to design a hierarchical encoder cooperation module for enhanced information interaction. An intermediate loss is leveraged to help the encoder to keep spatial knowledge reserved. Furthermore, an audio -guided class -aware decoder is devised to guide the generation of masks. Our approach has yielded competitive experimental results across multiple datasets, thus substantiating its effectiveness.

关键词： Audio-visual segmentation hierarchical encoder Neural architecture search Audio guidance

来源：评论

学校读者我要写书评

暂无评论

Table-to-Text Generation via Row-Aware hierarchical encoder 18th

Table-to-Text Generation via Row-Aware Hierarchical Encoder

引用

18th China National Conference on Computational Linguistics (CCL)

作者： Gong, Heng Feng, Xiaocheng Qin, Bing Liu, Ting Harbin Inst Technol Res Ctr Social Comp & Informat Retrieval Harbin Peoples R China

ISBN: (纸本)9783030323813;9783030323806

In this paper, we present a neural model to map structured table into document-scale descriptive texts. Most existing neural network based approaches encode a table record-by-record and generate long summaries by attentional encoder-decoder model, which leads to two problems. (1) portions of the generated texts are incoherent due to the mismatch between the row and corresponding records. (2) a lot of irrelevant information is described in the generated texts due to the incorrect selection of the redundant records. Our approach addresses both problems by modeling the row representation as an intermediate structure of the table. In the encoding phase, we first learn record-level representation via transformer encoder. Afterwards, we obtain each row's representation according to their corresponding records' representation and model row-level dependency via another transformer encoder. In the decoding phase, we first attend to row-level representation to find important rows. Then, we attend to specific records to generate texts. Experiments were conducted on ROTOWIRE, a dataset which aims at producing a document-scale NBA game summary given structured table of game statistics. Our approach improves a strong baseline's BLEU score from 14.19 to 15.65 (+10.29%). Furthermore, three extractive evaluation metrics and human evaluation also show that our model has the ability to select salient records and the generated game summary is more accurate.

关键词： Table-to-Text generation Seq2Seq hierarchical encoder

来源：评论

学校读者我要写书评

暂无评论

Table-to-Text Generation via Row-Aware hierarchical encoder

Table-to-Text Generation via Row-Aware Hierarchical Encoder

引用

第十八届中国计算语言学大会暨中国中文信息学会2019学术年会

作者： Heng Gong Xiaocheng Feng Bing Qin Ting Liu Research Center for Social Computing and Information Retrieval Harbin Institute of TechnologyHarbinChina

ISBN: (纸本)9783030323806

In this paper,we present a neural model to map structured table into document-scale descriptive *** existing neural net-work based approaches encode a table record-by-record and generate long summaries by attentional encoder-decoder model,which leads to two problems.(1)portions of the generated texts are incoherent due to the mismatch between the row and corresponding records.(2)a lot of irrelevant information is described in the generated texts due to the in-correct selection of the redundant *** approach addresses both problems by modeling the row representation as an intermediate struc-ture of the *** the encoding phase,we first learn record-level rep-resentation via transformer ***,we obtain each row's representation according to their corresponding records' representation and model row-level dependency via another transformer *** the decoding phase , we first attend to row-level representation to find important ***,we attend to specific records to generate *** were conducted on ROTOWIRE,a dataset which aims at producing a document-scale NBA game summary given structured ta-ble of game *** approach improves a strong baseline's BLEU score from 14.19 to 15.65(+10.29%).Furthermore,three extractive eval-uation metrics and human evaluation also show that our model has the ability to select salient records and the generated game summary is more accurate.

关键词： Table-to-Text Generation Seq2Seq hierarchical encoder

来源：评论

学校读者我要写书评

暂无评论

Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information 23

Improving Mandarin Prosodic Structure Prediction with Multi-...

引用

Interspeech Conference

作者： Chen, Jie Song, Changhe Tuo, Deyi Wu, Xixin Kang, Shiyin Wu, Zhiyong Meng, Helen Tsinghua Univ Shenzhen Int Grad Sch Shenzhen Peoples R China XVerse Inc Shenzhen Peoples R China Huya Inc Guangzhou Peoples R China Chinese Univ Hong Kong Hong Kong Peoples R China

For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) plays an important role in producing natural and intelligible speech. Although inter-utterance linguistic information can influence the speech interpretation of the target utterance, previous works on PSP mainly focus on utilizing intrautterance linguistic information of the current utterance only. This work proposes to use inter-utterance linguistic information to improve the performance of PSP. Multi-level contextual information, which includes both inter-utterance and intrautterance linguistic information, is extracted by a hierarchical encoder from character level, utterance level and discourse level of the input text. Then a multi-task learning (MTL) decoder predicts prosodic boundaries from multi-level contextual information. Objective evaluation results on two datasets show that our method achieves better F1 scores in predicting prosodic word (PW), prosodic phrase (PPH) and intonational phrase (IPH). It demonstrates the effectiveness of using multi-level contextual information for PSP. Subjective preference tests also indicate the naturalness of synthesized speeches are improved(1).

关键词： prosodic structure prediction multi-level contextual information hierarchical encoder

来源：评论

学校读者我要写书评

暂无评论

Towards Open World Traffic Classification 23rd

Towards Open World Traffic Classification

引用

23rd International Conference on Information and Communications Security (ICICS)

作者： Liu, Zhu Cai, Lijun Zhao, Lixin Yu, Aimin Meng, Dan Chinese Acad Sci Inst Informat Engn Beijing Peoples R China Univ Chinese Acad Sci Sch Cyber Secur Beijing Peoples R China

ISBN: (纸本)9783030868901;9783030868895

Due to the dynamic evolution of network traffic, open world traffic classification has become a vital problem. Traditional traffic classification methods have achieved success to a certain extent but failed with unknown traffic detection due to the assumption of a closed world. Existing techniques on unknown traffic detection suffer from an unsatisfactory accuracy and robustness because they lack design according to the hierarchical structure of network flows. Meanwhile, the diverse flow patterns in the same attacks and the similar flow patterns from different attacks lead to the existence of hard examples, which degrades the classification performance. As a solution, we present a Siamese hierarchical encoder Network for traffic classification in an open world setting. We import a hierarchical encoder mechanism which mines the potential sequential and spatial characteristics of traffic deeply and adopt the siamese structure with a new designed complementary loss function which focuses on mining hard paired examples and quickens the convergence. Both of the key designs conjointly learn the intra-class compactness and inter-class separateness in the feature space to set aside more space for unknown traffic. Our comprehensive experiments on real-world datasets covering intrusion detection and malware detection indicate that SHE-Net achieves excellent performance and outperforms the state-of-the-art methods.

关键词： Open world Traffic classification hierarchical encoder

来源：评论

学校读者我要写书评

暂无评论

RKC-H: A Rich Knowledge Based Model for Multi-turn Dialogue Generation 14

RKC-H: A Rich Knowledge Based Model for Multi-turn Dialogue ...

引用

14th International Symposium on Theoretical Aspects of Software Engineering (TASE)

作者： Xu, Feifei Ding, Guanqun Zhang, Wenkai Audrey Shanghai Univ Elect Power Sch Comp Sci & Technol Shanghai Peoples R China

ISBN: (纸本)9781728140865

When conversational communication, people often draw upon their rich world knowledge in addition to the dialogue context. The commonsense world fact can facilitate natural language understanding. In the paper, we present a rich knowledge cognition hierarchical (RKC-H) multi-turn dialogue model in open-domain to improve language generation. Given the input, the model selects the corresponded seed-graphs and encodes the seed-graph nodes with a seed-graph attention mechanism. Then, the hierarchical encoder captures the multi-granularity of current utterance and history dialogue text features. We apply graph-to-sequence generator to the responses and provide Exponential Maximum Mutual Information loss function. Automatic and human evaluations show that the proposed model can complete rich meaning and coherent multi-turn dialogue. Our model outperforms over the baseline.

关键词： multi-turn dialogue system knowledge graph graph attention hierarchical encoder

来源：评论

学校读者我要写书评

暂无评论

ConvUNET: a Novel Depthwise Separable ConvNet for Lung Nodule Segmentation

ConvUNET: a Novel Depthwise Separable ConvNet for Lung Nodul...

引用

2023 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023

作者： Tang, Xinkai Liu, Feng Kong, Ruoshan Luo, Fei Huang, Wencai Zou, Jiani Wuhan University School of Computer Science Wuhan China General Hospital of Central Theater Command of the PLA Department of Radiology Wuhan China

ISBN: (纸本)9798350337488

Lung nodule segmentation is usually considered a 3D semantic segmentation task. Due to the small size, diverse morphology, and low recognition of lung nodules, it is hard to segment any nodule precisely. To solve this problem, we propose a lightweight depthwise separable convolutional network named ConvUNET, which consists of a hierarchical encoder and a U-shaped decoder. Compared with some Transformer-based models (e.g., SwinUNETR) and ConvNeXt-based models (e.g., 3D UX-Net), our model has the advantages of fewer parameters, faster inference speed, and higher accuracy. We test the segmentation performance on the LUNA-16 and LNDb-19 datasets using standard 5-fold cross-validations, and the proposed method achieves competitive dice scores of 88.90% and 84.16%, respectively. Besides, it also shows considerable precision in segmenting lung nodules with diverse characteristics. Our source code is available at https://***/Xinkai-Tang/ConvUNET. © 2023 IEEE.

关键词： depthwise separable convolutional network hierarchical encoder lung nodule segmentation U-shaped decoder

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：