检索结果-内蒙古大学图书馆

Computer Information Science and Artificial Intelligence (CISAI), International Conference on

作者： Yongkang Du Weilong Ding Ying Liang School of Information Science and Technology North China University of Technology Beijing China Research Center for Ubiquitous Computing Systems Institute of Computing Technology Chinese Academy of Science Beijing China

ISBN: (纸本)9781665406932

With the development of information technology, academic data has increased dramatically. Nowadays, recommendation algorithm is widely used for scholars to find useful information from massive data. However, due to the discrete data and complex semantic relationship in the academic network, existing recommendation algorithms have limitations in feature extraction and data sparsity, which bring negative impacts on recommendation accuracy and personalization. In order to solve the above problems, we propose FF-PRec, a paper recommendation method based on feature fusion in the academic network. first, FF-PRec implements graph representation learning and a natural language processing tool to extract network features and text features respectively. Two types of features are combined as the representation vectors of scholars and papers. Second, a meta-path is designed based on prior knowledge to guide semantic information extraction. To validate our proposed method, we conducted experiments on the AMiner dataset. The experimental results indicate that FF-PRec performs better than traditional methods in paper recommendation tasks and shows high accuracy and correlation.

关键词： Representation learning Information science Correlation Semantics Feature extraction Information retrieval natural language processing

来源：评论

学校读者我要写书评

暂无评论

Comparing approaches to dravidian language identification

arXiv

引用

arXiv 2021年

作者： Jauhiainen, Tommi Ranasinghe, Tharindu Zampieri, Marcos University of Helsinki Finland University of Wolverhampton United Kingdom Rochester Institute of Technology United States

This paper describes the submissions by team HWR to the Dravidian language Identification (DLI) shared task organized at VarDial 2021 workshop. The DLI training set includes 16,674 YouTube comments written in Roman script containing code-mixed text with English and one of the three South Dravidian languages: Kannada, Malayalam, and Tamil. We submitted results generated using two models, a Naive Bayes classifier with adaptive language models, which has shown to obtain competitive performance in many language and dialect identification tasks, and a transformer-based model which is widely regarded as the state-of-the-art in a number of NLP tasks. Our first submission was sent in the closed submission track using only the training set provided by the shared task organisers, whereas the second submission is considered to be open as it used a pretrained model trained with external data. Our team attained shared second position in the shared task with the submission based on Naive Bayes. Our results reinforce the idea that deep learning methods are not as competitive in language identification related tasks as they are in many other text classification tasks. © 2021, CC BY.

关键词： natural language processing systems

来源：评论

学校读者我要写书评

暂无评论

Information extraction from digital social trace data with applications to social media and scholarly communication data

Information extraction from digital social trace data with a...

引用

作者： Mishra, Shubhanshu University of Illinois – Urbana-Champaign

学位级别：博士

Information extraction (IE) aims at extracting structured data from unstructured or semi-structured data. The thesis starts by identifying social media data and scholarly communication data as a special case of digital social trace data (DSTD). This identification allows us to utilize the graph structure of the data (e.g., user connected to a tweet, author connected to a paper, author connected to authors, etc.) for developing new information extraction tasks. The thesis focuses on information extraction from DSTD, first, using only the text data from tweets and scholarly paper abstracts, and then using the full graph structure of Twitter and scholarly communications datasets. This thesis makes three major contributions. first, new IE tasks based on DSTD representation of the data are introduced. For scholarly communication data, methods are developed to identify article and author level novelty and expertise. Furthermore, interfaces for examining the extracted information are introduced. A social communication temporal graph (SCTG) is introduced for comparing different communication data like tweets tagged with sentiment, tweets about a search query, and Facebook group posts. For social media, new text classification categories are introduced, with the aim of identifying enthusiastic and supportive users, via their tweets. Additionally, the correlation between sentiment classes and Twitter meta-data in public corpora is analyzed, leading to the development of a better model for sentiment classification. Second, methods are introduced for extracting information from social media and scholarly data. For scholarly data, a semi-automatic method is introduced for the construction of a large-scale taxonomy of computer science concepts. The method relies on the Wikipedia category tree. The constructed taxonomy is used for identifying key computer science phrases in scholarly papers, and tracking their evolution over time. Similarly, for social media data, machine lear

关键词： Social Media Analysis Machine Learning Data Mining Scholarly Data Analysis Digital Libraries Visualization Computer Science Information Science Open Source Multi task learning Deep Learning Active Learning natural language processing Big Data Analysis

来源：评论

学校读者我要写书评

暂无评论

Efficient Neural Architecture Search for End-to-End Speech Recognition Via Straight-Through Gradients

Efficient Neural Architecture Search for End-to-End Speech R...

引用

IEEE Spoken language Technology workshop

作者： Huahuan Zheng Keyu An Zhijian Ou Speech Processing and Machine Intelligence (SPMI) Lab Tsinghua University China

ISBN: (数字)9781728170664

ISBN: (纸本)9781728170671

Neural Architecture Search (NAS), the process of automating architecture engineering, is an appealing next step to advancing end-to-end Automatic Speech Recognition (ASR), replacing expert-designed networks with learned, task-specific architectures. In contrast to early computational-demanding NAS methods, recent gradient-based NAS methods, e.g., DARTS (Differentiable ARchiTecture Search), SNAS (Stochastic NAS) and ProxylessNAS, significantly improve the NAS efficiency. In this paper, we make two contributions. first, we rigorously develop an efficient NAS method via Straight-Through (ST) gradients, called ST-NAS. Basically, ST-NAS uses the loss from SNAS but uses ST to back-propagate gradients through discrete variables to optimize the loss, which is not revealed in ProxylessNAS. Using ST gradients to support sub-graph sampling is a core element to achieve efficient NAS beyond DARTS and SNAS. Second, we successfully apply ST-NAS to end-to-end ASR. Experiments over the widely benchmarked 80-hour WSJ and 300-hour Switchboard datasets show that the ST-NAS induced architectures significantly outperform the human-designed architecture across the two datasets. Strengths of ST-NAS such as architecture transferability and low computation cost in memory and time are also reported.

关键词： Conferences Computer architecture Switches Benchmark testing Computational efficiency Task analysis Automatic speech recognition

来源：评论

学校读者我要写书评

暂无评论

A practical guide to hybrid natural language processing

A practical guide to hybrid natural language processing

引用

2020年

作者： Jose Manuel Gomez-Perez Ronald Denaux Andres Garcia-Silva.

ISBN: (纸本)9783030448325

This book provides readers with a practical guide to the principles of hybrid approaches to natural language processing (NLP) involving a combination of neural methods and knowledge graphs. To this end, it first introduces the main building blocks and then describes how they can be integrated to support the effective implementation of real-world NLP applications. To illustrate the ideas described, the book also includes a comprehensive set of experiments and exercises involving different algorithms over a selection of domains and corpora in various NLP tasks. Throughout, the authors show how to leverage complementary representations stemming from the analysis of unstructured text corpora as well as the entities and relations described explicitly in a knowledge graph, how to integrate such representations, and how to use the resulting features to effectively solve NLP tasks in a range of domains. In addition, the book offers access to executable code with examples, exercises and real-world applications in key domains, like disinformation analysis and machine reading comprehension of scientific literature. All the examples and exercises proposed in the book are available as executable Jupyter notebooks in a GitHub repository. They are all ready to be run on Google Colaboratory or, if preferred, in a local environment. A valuable resource for anyone interested in the interplay between neural and knowledge-based approaches to NLP, this book is a useful guide for readers with a background in structured knowledge representations as well as those whose main approach to AI is fundamentally based on logic. Further, it will appeal to those whose main background is in the areas of machine and deep learning who are looking for ways to leverage structured knowledge bases to optimize results along the NLP downstream.

关键词： natural language processing (Computer science) Artificial intelligence Application software

来源：评论

学校读者我要写书评

暂无评论

基于时间正交变换矩阵的时态知识图谱补全

引用

计算机技术与发展 2025年第1期35卷 124-131页

作者：张昆吴永城张永伟贾玮翟世臣严丽南京南瑞智慧交通科技有限公司江苏南京210000 南京航空航天大学江苏南京211106

时态知识图谱补全(Temporal Knowledge graph Completion,TKGC)旨在依据已有的知识图谱,补全时态知识四元组中缺失的部分,其可以提高下游应用的性能,比如推荐系统、信息检索等。然而,目前的时态知识图谱补全方法主要针对实体补全或关系... 详细信息

时态知识图谱补全(Temporal Knowledge graph Completion,TKGC)旨在依据已有的知识图谱,补全时态知识四元组中缺失的部分,其可以提高下游应用的性能,比如推荐系统、信息检索等。然而,目前的时态知识图谱补全方法主要针对实体补全或关系补全,忽略了时态知识图谱中重要的时间元素。这导致它们未能有效地同时解决实体补全、关系补全和时间戳补全的问题,从而限制了自然语言处理系统对于时态信息的准确理解和处理能力。为了解决此问题,提出了一种基于时间正交变换矩阵的时态知识图谱补全方法(Orthogonal Transformation Matrix-based TKGC,OTM)。具体来说,该模型包括两个阶段:时态实体关系嵌入以及评分函数。首先,第一阶段引入时间正交变换矩阵,利用正交矩阵的保范性学习时态感知的实体表示和关系表示。然后,第二阶段设计评分函数以评估时态知识四元组的合理性,并设计损失函数更新实体、关系和时间戳的向量表示。在广泛使用的YAGO和Wikidata数据集上进行了实体补全、关系补全和时间补全实验。实验结果表明了该模型在时态知识图谱补全的可行性和有效性。

关键词：时态知识图谱实体补全关系补全时间补全时间正交变换矩阵

来源：评论

学校读者我要写书评

暂无评论

基于字符表示学习与时序边界扩散的网络安全实体识别方法

引用

电子与信息学报 2025年第5期47卷 1554-1568页

作者：胡泽李文君杨宏宇中国民航大学安全科学与工程学院天津300300 中国民航大学计算机科学与技术学院天津300300

网络安全实体识别作为威胁信息抽取、构建知识图谱的基础,对于发现和应对网络威胁具有至关重要的作用。该文针对当前主流的命名实体识别方法在网络安全领域泛化能力欠佳、难以清晰判断网络安全实体边界的问题,提出一种基于字符表示学习... 详细信息

网络安全实体识别作为威胁信息抽取、构建知识图谱的基础,对于发现和应对网络威胁具有至关重要的作用。该文针对当前主流的命名实体识别方法在网络安全领域泛化能力欠佳、难以清晰判断网络安全实体边界的问题,提出一种基于字符表示学习与时序边界扩散的网络安全实体识别方法。该方法首先将命名实体识别任务分解为实体边界检测与实体分类两个子任务,分别进行处理;其次,对于实体边界检测任务,使用基于问答的方法将预定义的问题与数据进行编码,采用膨胀卷积残差字符网络进行数据的字符级特征提取,并使用时序边界扩散网络判断实体边界;然后,对于实体分类任务,同样使用问答方法,并独立训练分类器进行实体类型判断;最后将实体边界检测任务的结果输入实体分类任务判断实体的类型。为验证方法有效性,在网络威胁情报数据集DNRTI上进行测试。实验结果表明,边界检测效率的提升能够有效增强命名实体识别的性能。该方法在网络安全实体识别任务中不仅资源开销较小,且对比近年提出的基线方法性能有所提升,其中较最近两年的方法在F1分数上提升了0.40%~1.65%。

关键词：命名实体识别网络安全边界检测深度学习自然语言处理

来源：评论

学校读者我要写书评

暂无评论

Contextualized knowledge-aware attentive neural network: Enhancing answer selection with knowledge

arXiv

引用

arXiv 2021年

作者： Deng, Yang Xie, Yuexiang Li, Yaliang Yang, Min Lam, Wai Shen, Ying The Chinese University of Hong Kong Hong Kong Alibaba Group China Alibaba Group United States SIAT Chinese Academy of Sciences China Sun Yat-Sen University China

Answer selection, which is involved in many natural language processing applications such as dialog systems and question answering (QA), is an important yet challenging task in practice, since conventional methods typically suffer from the issues of ignoring diverse real-world background knowledge. In this paper, we extensively investigate approaches to enhancing the answer selection model with external knowledge from knowledge graph (KG). first, we present a context-knowledge interaction learning framework, Knowledge-aware Neural Network (KNN), which learns the QA sentence representations by considering a tight interaction with the external knowledge from KG and the textual information. Then, we develop two kinds of knowledge-aware attention mechanism to summarize both the context-based and knowledge-based interactions between questions and answers. To handle the diversity and complexity of KG information, we further propose a Contextualized Knowledge-aware Attentive Neural Network (CKANN), which improves the knowledge representation learning with structure information via a customized graph Convolutional Network (GCN) and comprehensively learns context-based and knowledge-based sentence representation via the multi-view knowledge-aware attention mechanism. We evaluate our method on four widely-used benchmark QA datasets, including WikiQA, TREC QA, InsuranceQA and Yahoo QA. Results verify the benefits of incorporating external knowledge from KG, and show the robust superiority and extensive applicability of our method. © 2021, CC BY.

关键词： Knowledge graph

来源：评论

学校读者我要写书评

暂无评论

DANGNT@***-HCM at SemEval 2019 task 1: graph transformation system from stanford basic dependencies to universal conceptual cognitive annotation (UCCA) 13

DANGNT@***-HCM at SemEval 2019 task 1: Graph transformation ...

引用

13th International workshop on Semantic Evaluation, SemEval 2019, co-located with the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human language Technologies, NAACL HLT 2019

作者： Nguyen, Dang Tuan Tran, Trung University of Information Technology VNU-HCM Ho Chi Minh City Viet Nam

ISBN: (纸本)9781950737062

This paper describes the graph transformation system (GT System) for SemEval 2019 Task 1: Cross-lingual Semantic Parsing with Universal Conceptual Cognitive Annotation (UCCA)1. The input of GT System is a pair of text and its unannotated xml, which is a layer 0 part of UCCA form. The output of GT System is the corresponding full UCCA xml. based on the idea of graph illustration and transformation, we perform four main tasks when building GT System. At the first task, we illustrate the graph form of stanford dependencies2 of input text. We then transform into an intermediate graph in the second task. At the third task, we continue to transform into ouput graph form. Finally, we create the output UCCA xml. The evaluation results show that our method generates good-quality UCCA xml and has a meaningful contribution to the semantic representation sub-field in natural language processing. © 2019 Association for Computational Linguistics

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

基于公开履历数据的人物知识图谱构建

引用

数据分析与知识发现 2021年第7期5卷 81-90页

作者：沈科杰黄焕婷化柏林北京大学信息管理系北京100871

【目的】基于公开履历信息,结合自然语言处理技术与知识图谱构建技术,自动化建立履历知识图谱,为传统研究提供新的视角和工具。【应用背景】自动抽取履历数据中的人物背景、职衔信息并构建任职经历和机构同事等关系,通过可视化呈现的方... 详细信息

【目的】基于公开履历信息,结合自然语言处理技术与知识图谱构建技术,自动化建立履历知识图谱,为传统研究提供新的视角和工具。【应用背景】自动抽取履历数据中的人物背景、职衔信息并构建任职经历和机构同事等关系,通过可视化呈现的方式为企事业单位的人才选拔、人事任免任务提供决策支持。【方法】爬虫获取履历数据后,使用BERT-BiLSTM-CRF模型进行实体识别,通过定义规则与融合外部领域知识构建实体间关系,并使用Neo4j图数据库实现实体及关系的存储与图谱可视化。【结果】BERT-BiLSTM-CRF模型在实体识别任务测试集上的准确率为84.85%。图谱囊括561位干部履历信息,包含3类共8174个实体和5类共20162条关系,能够支持多角度的查询与分析挖掘。【结论】构建的知识图谱发掘了履历文本间的内在关联,为基于履历数据的研究应用提供了一种新颖易用的方案,但暂缺乏精细化的实体对齐处理和机构实体之间统属关系的构建。

关键词：履历分析知识图谱实体识别人物图谱

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：