检索结果-内蒙古大学图书馆

arXiv 2021年

作者： Zehtab-Salmasi, Aidin Feizi-Derakhshi, Mohammad-Reza Balafar, Mohamad-Ali Computerized Intelligence Systems Laboratory Department of Computer Engineering University of Tabriz Tabriz Iran Department of Computer Engineering University of Tabriz Tabriz Iran

Keyword extraction is the process of identifying the words or phrases that express the main concepts of text to the best of one’s ability. Electronic infrastructure creates a considerable amount of text every day and at all times. This massive volume of documents makes it practically impossible for human resources to study and manage them. Nevertheless, the need for these documents to be accessed efficiently and effectively is evident in numerous purposes. A blog, news article, or technical note is considered a relatively long text since the reader aims to learn the subject based on keywords or topics. Our approach consists of a combination of two models: graph centrality features and textural features. The proposed method has been used to extract the best keyword among the candidate keywords with an optimal combination of graph centralities, such as degree, betweenness, eigenvector, closeness centrality and etc, and textural, such as Casing, Term position, Term frequency normalization, Term different sentence, Part Of Speech tagging. There have also been attempts to distinguish keywords from candidate phrases and consider them on separate keywords. For evaluating the proposed method, seven datasets were used: Semeval2010, SemEval2017, Inspec, fao30, Thesis100, pak2018, and Wikinews, with results reported as Precision, Recall, and F- measure. Our proposed method performed much better in terms of evaluation metrics in all reviewed datasets compared with available methods in literature. An approximate 16.9% increase was witnessed in F-score metric and this was much more for the Inspec in English datasets and WikiNews in forgone languages. Copyright © 2021, The Authors. All rights reserved.

关键词： natural language processing systems

来源：评论

学校读者我要写书评

暂无评论

Relation Extraction using language Model based on Knowledge graph

Relation Extraction using Language Model Based on Knowledge ...

引用

作者： Chengli Xing Xueyang Liu Dongdong Du Wenhui Hu Minghui Zhang SiChuan Tianfu Bank Co. Ltd. National Engineering Research Centre for Software Engineering Peking University China Academy of Industrial Internet Handan Institute of Innovation Peking University

Relation extraction is an important task in natural language processing(NLP). The existing methods generally pay more attention on extracting textual semantic information from text, but ignore the relation contextual information from existed relations in datasets, which is very important for the performance of relation extraction task. In this paper, we represent each individual entity as a embedding based on entities and relations knowledge graph, which encodes the relation contextual information between the given entity pairs and relations. Besides, inspired by the impressive performance of language models recently, we used the language model to leverage word semantic information, in which word semantic information can be better captured than word embedding. The experimental results on SemEval2010 Task 8 dataset showed that the F1-score of our proposed method improved nearly 3% compared with the previous methods.

关键词： Relation extraction Knowledge graph language model

来源：评论

学校读者我要写书评

暂无评论

Persuasive explanation of reasoning inferences on dietary data 6

Persuasive explanation of reasoning inferences on dietary da...

引用

Joint 6th International workshop on Dataset PROFILing and Search and the 1st workshop on Semantic Explainability, PROFILES-SEMEX 2019

作者： Donadello, Ivan Dragoni, Mauro Eccher, Claudio Fondazione Bruno Kessler Via Sommarive 18 TrentoI-38123 Italy

Explainable AI aims at building intelligent systems that are able to provide a clear, and human understandable, justification of their decisions. This holds for both rule-based and data-driven methods. In management of chronic diseases, the users of such systems are patients that follow strict dietary rules to manage such diseases. After receiving the input of the intake food, the system performs reasoning to understand whether the users follow an unhealthy behaviour. Successively, the system has to communicate the results in a clear and effective way, that is, the output message has to persuade users to follow the right dietary rules. In this paper, we address the main challenges to build such systems: i) the natural language generation of messages that explain the reasoner inconsistency;ii) the effectiveness of such messages at persuading the users. Results prove that the persuasive explanations are able to reduce the unhealthy users’ behaviours. Copyright © 2019 for this paper by its authors.

关键词： natural language processing systems

来源：评论

学校读者我要写书评

暂无评论

Modified Complementary Joint Sparse Representations: A Novel Post-Filtering to MVDR Beamforming 33

Modified Complementary Joint Sparse Representations: A Novel...

引用

33rd IEEE International workshop on Signal processing Systems (IEEE SiPS)

作者： Zhu, Yuanyuan Fu, Jiafei Xu, Xu Ye, Zhongfu Univ Sci & Technol China Dept Elect Engn & Informat Sci Hefei Peoples R China Natl Engn Lab Speech & Language Informat Proc Hefei Peoples R China

ISBN: (纸本)9781728119274

Post-filtering is a popular technique for multichannel speech enhancement system, in order to further improve the speech quality and intelligibility after beamforming. This paper presents a novel post-filtering to a minimum variance distortionless response (MVDR) beamforming which is a single-channel modified complementary joint sparse representations (M-CJSR) method. First, MVDR beamformer is used to suppress interference and noise. Subsequently, the proposed M-CJSR approach based on joint dictionary learning is applied as a single microphone post-filter to process the beamformer output. Different from the existing post-filtering techniques which rely on the assumptions about the noise field, this algorithm considers a more generalized signal model including the ambient noise, like diffuse noise or white noise, as well as the point-source interference. Moreover, the original CJSR method is extended to jointly learn dictionaries for not only the mappings from mixture to speech and noise, but also the mapping from mixture to interference. In order to take the complementary advantages of different sparse representations, we design the weighting parameters based on the residual components of the estimated signals. An experimental study which consists of objective evaluations under various conditions verifies the superiority of the proposed algorithm compared to other state-of-the-art methods.

关键词： Speech enhancement post-filtering joint sparse representations residual weighting

来源：评论

学校读者我要写书评

暂无评论

Efficient Generation and processing of Word Co-occurrence Networks Using corpus2graph 12

Efficient Generation and Processing of Word Co-occurrence Ne...

引用

12th workshop on graph-based methods for natural language processing, Textgraphs 2018 - in conjunction with the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human, NAACL HLT 2018

作者： Zhang, Zheng Yin, Ruiqing Zweigenbaum, Pierre LIMSI CNRS Université Paris-Saclay Orsay France LRI Univ. Paris-Sud CNRS Université Paris-Saclay Orsay France

ISBN: (纸本)9781948087254

Corpus2graph is an open-source NLP-application-oriented Python package that generates a word co-occurrence network from a large corpus. It not only contains different built-in methods to preprocess words, analyze sentences, extract word pairs and define edge weights, but also supports user-customized functions. By using parallelization techniques, it can generate a large word co-occurrence network of the whole English Wikipedia data within hours. And thanks to its nodes-edges-weight three-level progressive calculation design, rebuilding networks with different configurations is even faster as it does not need to start all over again. This tool also works with other graph libraries such as igraph, NetworkX and graph-tool as a front end providing data to boost network generation speed. © 2018 Association for Computational Linguistics.

关键词： Python

来源：评论

学校读者我要写书评

暂无评论

Automatic de-identification of medical texts in Spanish: The Meddocan track, corpus, guidelines, methods and evaluation of results

Automatic de-identification of medical texts in Spanish: The...

引用

2019 Iberian languages Evaluation Forum, IberLEF 2019

作者： Marimon, Montserrat Gonzalez-Agirre, Aitor Intxaurrondo, Ander Rodríguez, Heidy Martin, Jose Antonio Lopez Villegas, Marta Krallinger, Martin Spain Spain Hospital 12 de Octubre Madrid Spain

There is an increasing interest in exploiting the content of electronic health records by means of natural language processing and text-mining technologies, as they can result in resources for improving patient health/safety, aid in clinical decision making, facilitate drug re-purposing or precision medicine. To share, re-distribute and make clinical narratives accessible for text mining research purposes, it is key to fulfill legal conditions and address restrictions related data protection and patient privacy. Thus, clinical records cannot be shared directly"as is". A necessary precondition for accessing clinical records outside of hospitals is their de-identification or exhaustive removal/replacement of all mentioned privacy related protected health information phrases. Providing a proper evaluation scenario for automatic anonymization tools is key for approval of data redistribution. The construction of manually de-identified medical records is currently the main rate and cost-limiting step for secondary use applications of clinical data. This paper summarizes the settings, data and results of the first shared track on anonymization of medical documents in Spanish, the MEDDOCAN (Medical Document Anonymization) track. This track relied on a carefully constructed synthetic corpus of clinical case documents, the MEDDOCAN corpus, following annotation guidelines for sensitive data based on the analysis of the EU General Data Protection Regulation. A total of 18 teams (from the 51 registrations) submitted 63 runs for first sub-track 1 and 61 systems for the second sub-track. The top scoring systems were based on sophisticated deep learning approaches, representing strategies that can significantly reduce time and costs associated to accessing textual data containing privacy-related sensitive information. The results of this track might help in lowering the clinical data access hurdle for Spanish language technology developers, showing also potentials for similar setti

关键词： natural language processing systems

来源：评论

学校读者我要写书评

暂无评论

The scientization of literary study 3

The scientization of literary study

引用

3rd Joint SIGHUM workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, LaTeCH@NAACL-HLT 2019

作者： Degaetano-Ortlieb, Stefania Piper, Andrew Language Science and Technology Saarland University Saarbrücken Germany Languages Literatures and Cultures McGill University Montreal Canada

ISBN: (纸本)9781950737000

Scholarly practices within the humanities have historically been perceived as distinct from the natural sciences. We look at literary studies, a discipline strongly anchored in the humanities, and hypothesize that over the past half-century literary studies has instead undergone a process of "scientization", adopting linguistic behavior similar to the sciences. We test this using methods based on information theory, comparing a corpus of literary studies articles (around 63,400) with a corpus of standard English and scientific English respectively. We show evidence for "scientization" effects in literary studies, though at a more muted level than scientific English, suggesting that literary studies occupies a middle ground with respect to standard English in the larger space of academic disciplines. More generally, our methodology can be applied to investigate the social positioning and development of language use across different domains (e.g. scientific disciplines, language varieties, registers). © 2019 Association for Computational *** right reserved.

关键词： Information theory

来源：评论

学校读者我要写书评

暂无评论

Scientific Discovery as Link Prediction in Influence and Citation graphs 12

Scientific Discovery as Link Prediction in Influence and Cit...

引用

作者： Luo, Fan Valenzuela-Escárcega, Marco Hahn-Powell, Gus Surdeanu, Mihai University of Arizona TucsonAZ United States

ISBN: (纸本)9781948087254

We introduce a machine learning approach for the identification of "white spaces" in scientific knowledge. Our approach addresses this task as link prediction over a graph that contains over 2M influence statements such as "CTCF activates FOXA1", which were automatically extracted using open-domain machine reading. We model this prediction task using graph-based features extracted from the above influence graph, as well as from a citation graph that captures scientific communities. We evaluated the proposed approach through backtesting. Although the data is heavily unbalanced (50 times more negative examples than positives), our approach predicts which influence links will be discovered in the "near future" with a F1 score of 27 points, and a mean average precision of 68%. © 2018 Association for Computational Linguistics.

关键词： graphic methods

来源：评论

学校读者我要写书评

暂无评论

THU_NGN at SemEval-2019 task 12: Toponym detection and disambiguation on scientific papers 13

THU_NGN at SemEval-2019 task 12: Toponym detection and disam...

引用

13th International workshop on Semantic Evaluation, SemEval 2019, co-located with the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human language Technologies, NAACL HLT 2019

作者： Qi, Tao Ge, Suyu Wu, Chuhan Chen, Yubo Huang, Yongfeng Tsinghua National Laboratory for Information Science and Technology Department of Electronic Engineering Tsinghua University Beijing100084 China

ISBN: (纸本)9781950737062

Toponym resolution is an important and challenging task in the neural language processing field, and has wide applications such as emergency response and social media geographical event analysis. Toponym resolution can be roughly divided into two independent steps, i.e., toponym detection and toponym disambiguation. In order to facilitate the study on toponym resolution, the SemEval 2019 task 12 is proposed, which contains three subtasks, i.e., toponym detection, toponym disambiguation and toponym resolution. In this paper, we introduce our system that participated in the SemEval 2019 task 12. For toponym detection, in our approach we use TagLM as the basic model, and explore the use of various features in this task, such as word embeddings extracted from pre-trained language models, POS tags and lexical features extracted from dictionaries. For toponym disambiguation, we propose a heuristics rule-based method using toponym frequency and population. Our systems achieved 83.03% strict macro F1, 74.50 strict micro F1, 85.92 overlap macro F1 and 78.47 overlap micro F1 in toponym detection subtask. © 2019 Association for Computational Linguistics

关键词： Heuristic methods

来源：评论

学校读者我要写书评

暂无评论

Identification of semantic patterns in full-text documents using neural network methods 29

Identification of semantic patterns in full-text documents u...

引用

29th International Conference on Computer graphics and Vision, graphiCon 2019

作者： Zolotarev, O. Solomentsev, Y. Khakimova, A. Charnine, M. Russian New University Moscow Russia Moscow Institute of Physics and Technology Moscow Russia Research Center for Physical and Technical Informatics Nizhny Novgorod Russia Institute of Informatics Problems FRS CSC of the Russian Academy of Sciences Moscow Russia

processing and text mining are becoming increasingly possible thanks to the development of computer technology, as well as the development of artificial intelligence (machine learning). This article describes approaches to the analysis of texts in natural language using methods of morphological, syntactic and semantic analysis. Morphological and syntactic analysis of the text is carried out using the Pullenti system, which allows not only to normalize words, but also to distinguish named entities, their characteristics, and relationships between them. As a result, a semantic network of related named entities is built, such as people, positions, geographical names, business associations, documents, education, dates, etc. The word2vec technology is used to identify semantic patterns in the text based on the joint occurrence of terms. The possibility of joint use of the described technologies is being considered. Copyright © 2019 for this paper by its authors.

关键词： Syntactics

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：