检索结果-内蒙古大学图书馆

Capturing Global Structural Information in Long Document Question Answering with Compressive Graph Selector Network

学校读者我要写书评

暂无评论

Capturing Global Structural Information in Long Document Que...

2022 Conference on Empirical Methods in Natural language processing, EMNLP 2022

作者： Nie, Yuxiang Huang, Heyan Wei, Wei Mao, Xian-Ling School of Computer Science and Technology Beijing Institute of Technology China Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications China Beijing Institute of Technology Southeast Academy of Information Technology China Huazhong University of Science and Technology China

Long document question answering is a challenging task due to its demands for complex reasoning over long text. Previous works usually take long documents as non-structured flat texts or only consider the local structure in long documents. However, these methods usually ignore the global structure of the long document, which is essential for long-range understanding. To tackle this problem, we propose Compressive Graph Selector Network (CGSN) to capture the global structure in a compressive and iterative manner. The proposed model mainly focuses on the evidence selection phase of long document question answering. Specifically, it consists of three modules: local graph network, global graph network and evidence memory network. Firstly, the local graph network builds the graph structure of the chunked segment in token, sentence, paragraph and segment levels to capture the short-term dependency of the text. Secondly, the global graph network selectively receives the information of each level from the local graph, compresses them into the global graph nodes and applies graph attention to the global graph nodes to build the long-range reasoning over the entire text in an iterative way. Thirdly, the evidence memory network is designed to alleviate the redundancy problem in the evidence selection by saving the selected result in the previous steps. Extensive experiments show that the proposed model outperforms previous methods on two datasets. © 2022 Association for Computational Linguistics.

关键词： Iterative methods

USTC-NELSLIP at SemEval-2023 Task 2: Statistical Construction and Dual Adaptation of Gazetteer for Multilingual Complex NER

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Ma, Jun-Yu Gu, Jia-Chen Qi, Jiajun Ling, Zhen-Hua Liu, Quan Zhao, Xiaoyi National Engineering Research Center of Speech and Language Information Processing University of Science and Technology of China China State Key Laboratory of Cognitive Intelligence iFLYTEK Research China Communication University of China China

This paper describes the system developed by the USTC-NELSLIP team for SemEval-2023 Task 2 Multilingual Complex Named Entity Recognition (MultiCoNER II). A method named Statistical Construction and Dual Adaptation of Gazetteer (SCDAG) is proposed for Multilingual Complex NER. The method first utilizes a statistics-based approach to construct a gazetteer. Secondly, the representations of gazetteer networks and language models are adapted by minimizing the KL divergence between them at both the sentence-level and entity-level. Finally, these two networks are then integrated for supervised named entity recognition (NER) training. The proposed method is applied to XLM-R with a gazetteer built from Wikidata, and shows great generalization ability across different tracks. Experimental results and detailed analysis verify the effectiveness of the proposed method. The official results show that our system ranked 1st on one track (Hindi) in this task. Copyright © 2023, The Authors. All rights reserved.

关键词： Complex networks

Alternating Objectives Generates Stronger PGD-Based Adversarial Attacks

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Nikolaos, Antoniou Georgiou, Efthymios Potamianos, Alexandros School of Electrical and Computer Engineering National Technical University of Athens Athens Greece Institute for Language and Speech Processing Athena Research Center Athens Greece

Designing powerful adversarial attacks is of paramount importance for the evaluation of p-bounded adversarial defenses. Projected Gradient Descent (PGD) is one of the most effective and conceptually simple algorithms to generate such adversaries. The search space of PGD is dictated by the steepest ascent directions of an objective. Despite the plethora of objective function choices, there is no universally superior option and robustness overestimation may arise from ill-suited objective selection. Driven by this observation, we postulate that the combination of different objectives through a simple loss alternating scheme renders PGD more robust towards design choices. We experimentally verify this assertion on a synthetic-data example and by evaluating our proposed method across 25 different ∞-robust models and 3 datasets. The performance improvement is consistent, when compared to the single loss counterparts. In the CIFAR-10 dataset, our strongest adversarial attack outperforms all of the white-box components of AutoAttack (AA) ensemble [1], as well as the most powerful attacks existing on the literature, achieving state-of-the-art results in the computational budget of our study (T = 100, no restarts). Copyright © 2022, The Authors. All rights reserved.

关键词： Budget control

End-to-End Voice Conversion with Information Perturbation

学校读者我要写书评

暂无评论

End-to-End Voice Conversion with Information Perturbation

International Symposium on Chinese Spoken language processing

作者： Qicong Xie Shan Yang Yi Lei Lei Xie Dan Su Audio Speech and Language Processing Group (ASLP@NPU) School of Computer Science Northwestern Polytechnical University Xi’an China Tencent AI Lab China

ISBN: (纸本)9798350397970

The ideal goal of voice conversion is to convert the source speaker’s speech to sound naturally like the target speaker while maintaining the linguistic content and the prosody of the source speech. However, current approaches are insufficient to achieve comprehensive source prosody transfer and target speaker timbre preservation in the converted speech, and the quality of the converted speech is also unsatisfied due to the mismatch between the acoustic model and the vocoder. In this paper, we leverage the recent advances in information perturbation and propose a fully end-to-end approach to conduct high-quality voice conversion. We first adopt information perturbation to remove speaker-related information in the source speech to disentangle speaker timbre and linguistic content and thus the linguistic information is subsequently modeled by a content encoder. To better transfer the prosody of the source speech to the target, we particularly introduce a speaker-related pitch encoder which can maintain the general pitch pattern of the source speaker while flexibly modifying the pitch intensity of the generated speech. Finally, one-shot voice conversion is set up through continuous speaker space modeling. Experimental results indicate that the proposed end-to-end approach significantly outperforms the state-of-the-art models in terms of intelligibility, naturalness, and speaker similarity.

关键词： Perturbation methods Vocoders Linguistics speech Acoustics Timbre

IQDUBBING: PROSODY MODELING BASED ON DISCRETE SELF-SUPERVISED speech REPRESENTATION FOR EXPRESSIVE VOICE CONVERSION

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Gan, Wendong Wen, Bolong Yan, Ying Chen, Haitao Wang, Zhichao Du, Hongqiang Xie, Lei Guo, Kaixuan Li, Hai IQIYI Inc Chengdu China Audio Speech and Language Processing Group ASLP@NPU School of Computer Science Northwestern Polytechnical University Xi'An China

Prosody modeling is important, but still challenging in expressive voice conversion. As prosody is difficult to model, and other factors, e.g., speaker, environment and content, which are entangled with prosody in speech, should be removed in prosody modeling. In this paper, we present IQDubbing to solve this problem for expressive voice conversion. To model prosody, we leverage the recent advances in discrete self-supervised speech representation (DSSR). Specifically, prosody vector is first extracted from pre-trained VQWav2Vec model, where rich prosody information is embedded while most speaker and environment information are removed effectively by quantization. To further filter out the redundant information except prosody, such as content and partial speaker information, we propose two kinds of prosody filters to sample prosody from the prosody vector. Experiments show that IQDubbing is superior to baseline and comparison systems in terms of speech quality while maintaining prosody consistency and speaker similarity. Copyright © 2022, The Authors. All rights reserved.

关键词： Machine learning

Multimodal Tree Decoder for Table of Contents Extraction in Document Images

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Hu, Pengfei Zhang, Zhenrong Zhang, Jianshu Du, Jun Wu, Jiajia National Engineering Research Center of Speech and Language Information Processing University of Science and Technology of China Anhui Hefei China iFLYTEK Research China

Table of contents (ToC) extraction aims to extract headings of different levels in documents to better understand the outline of the contents, which can be widely used for document understanding and information retrieval. Existing works often use hand-crafted features and predefined rule-based functions to detect headings and resolve the hierarchical relationship between headings. Both the benchmark and research based on deep learning are still limited. Accordingly, in this paper, we first introduce a standard dataset, HierDoc, including image samples from 650 documents of scientific papers with their content labels. Then we propose a novel end-to-end model by using the multimodal tree decoder (MTD) for ToC as a benchmark for HierDoc. The MTD model is mainly composed of three parts, namely encoder, classifier, and decoder. The encoder fuses the multimodality features of vision, text, and layout information for each entity of the document. Then the classifier recognizes and selects the heading entities. Next, to parse the hierarchical relationship between the heading entities, a tree-structured decoder is designed. To evaluate the performance, both the metric of tree-edit-distance similarity (TEDS) and F1-Measure are adopted. Finally, our MTD approach achieves an average TEDS of 87.2% and an average F1-Measure of 88.1% on the test set of HierDoc. The code and dataset will be released at: https://***/Pengfei-Hu/MTD. Copyright © 2022, The Authors. All rights reserved.

关键词： Decoding

Thought-Path Contrastive Learning via Premise-Oriented Data Augmentation for Logical Reading Comprehension

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Wang, Chenxu Jian, Ping Yang, Zhen School of Computer Science and Technology Beijing Institute of Technology Beijing China Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications Beijing Institute of Technology Beijing China

Logical reading comprehension is a challenging task that involves understanding the underlying semantics of text and applying reasoning to deduce the correct answer. Prior researches have primarily focused on enhancing logical reasoning capabilities through Chain-of-Thought (CoT) or data augmentation. However, previous work constructing chain-of-thought rationales concentrates solely on analyzing correct options, neglecting the incorrect alternatives. Addtionally, earlier efforts on data augmentation by altering contexts rely on rule-based methods, which result in generated contexts that lack diversity and coherence. To address these issues, we propose a Premise-Oriented Data Augmentation (PODA) framework. This framework can generate CoT rationales including analyses for both correct and incorrect options, while constructing diverse and high-quality counterfactual contexts from incorrect candidate options. We integrate summarizing premises and identifying premises for each option into rationales. Subsequently, we employ multi-step prompts with identified premises to construct counterfactual context. To facilitate the model’s capabilities to better differentiate the reasoning process associated with each option, we introduce a novel thought-path contrastive learning method that compares reasoning paths between the original and counterfactual samples. Experimental results on three representative LLMs demonstrate that our method can improve the baselines substantially across two challenging logical reasoning benchmarks (ReClor and LogiQA 2.0). © 2024, CC BY.

关键词： Contrastive Learning

Anchored Monotonic Alignment and Representation Substitution for Rare Spontaneous Behaviors in Spontaneous speech Synthesis

学校读者我要写书评

暂无评论

Anchored Monotonic Alignment and Representation Substitution...

International Conference on Acoustics, speech, and Signal processing (ICASSP)

作者： Ning-Qian Wu Ya-Jun Hu Liping Chen Zhen-Hua Ling National Engineering Research Center of Speech and Language Information Processing University of Science and Technology of China Hefei P.R.China iFLYTEK Research iFLYTEK Co. Ltd. China MoE Key Laboratory of Brain-Inspired Intelligent Perception and Cognition University of Science and Technology of China

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Spontaneous behaviors in speech pose significant challenges for speech synthesis. Existing research has not adequately addressed these behaviors, with most studies relying on specially recorded datasets. In contrast, real-world data more accurately reflects the natural, spontaneous speaking styles in everyday life and encompasses a wider range of spontaneous behaviors. However, such data is often of lower quality, and the distribution of spontaneous behaviors is highly imbalanced. In this study, we explore spontaneous speech synthesis using real-world data within the VITS2 framework. To overcome these challenges, we introduce two techniques: anchored monotonic alignment and spontaneous hidden representation substitution. Experimental results demonstrate that these methods enhance model alignment and improve the naturalness of the generated speech. Our proposed approach successfully addresses the challenge of synthesizing rare spontaneous behaviors and offers users flexible control over the synthesized speech.

关键词： Training Accuracy Large language models speech enhancement Signal processing Acoustics speech synthesis

KnowLogic: A Benchmark for Commonsense Reasoning via Knowledge-Driven Data Synthesis

学校读者我要写书评

暂无评论

arXiv 2025年

作者： Zhan, Weidong Wang, Yue Hu, Nan Xiao, Liming Ma, Jingyuan Qin, Yuhang Li, Zheng Yang, Yixin Deng, Sirui Ding, Jinkun Ma, Wenhan Li, Rui Luo, Weilin Liu, Qun Sui, Zhifang Center for Chinese Linguistics Department of Chinese Language and Literature Peking University China School of Computer Science State Key Laboratory of Multimedia Information Processing Peking University China Huawei Noah’s Ark Lab China

Current evaluations of commonsense reasoning in LLMs are hindered by the scarcity of natural language corpora with structured annotations for reasoning tasks. To address this, we introduce KnowLogic, a benchmark generated through a knowledge-driven synthetic data strategy. KnowLogic integrates diverse commonsense knowledge, plausible scenarios, and various types of logical reasoning. One of the key advantages of KnowLogic is its adjustable difficulty levels, allowing for flexible control over question complexity. It also includes fine-grained labels for in-depth evaluation of LLMs’ reasoning abilities across multiple dimensions. Our benchmark consists of 3,000 bilingual (Chinese and English) questions across various domains, and presents significant challenges for current LLMs, with the highest-performing model achieving only 69.57%. Our analysis highlights common errors, such as misunderstandings of low-frequency commonsense, logical inconsistencies, and overthinking. This approach, along with our benchmark, provides a valuable tool for assessing and enhancing LLMs’ commonsense reasoning capabilities and can be applied to a wide range of knowledge domains. Our data and code can be found at https://***/pokerwf/ KnowLogic. Copyright © 2025, The Authors. All rights reserved.

关键词： Natural language processing systems