检索结果-内蒙古大学图书馆

International Conference on data engineering

作者： Yu Zhang Feng Zhang Hourun Li Shuhao Zhang Xiaoyong Du Key Laboratory of Data Engineering and Knowledge Engineering (MOE) and School of Information Renmin University of China Information Systems Technology and Design Pillar Singapore University of Technology and Design

Stream processing prevails and SQL query on streams has become one of the most popular application scenarios. For example, in 2021, the global number of active IoT endpoints reaches 12.3 billion. Unfortunately, the increasing scale of data and strict user requests place much pressure on existing stream processing systems, requiring high processing throughput with low latency. To further improve the performance of current stream processing systems, we propose a compression-based stream processing engine, called CompressStreamDB, which enables adaptive fine-grained stream processing directly on compressed streams, without decompression. Particularly, CompressStreamDB involves eight compression methods targeting various data types in streams, and it also provides a cost model for dynamically selecting the appropriate compression methods. By exploring data redundancy among streams, CompressStreamDB not only saves space in data transmission between client and server, but also achieves high throughput with low latency in SQL query on stream processing. Our experimental results show that compared to the state-of-the-art stream processing system on uncompressed streams, CompressStreamDB achieves 3.24× throughput improvement and 66.0% lower latency on average. Besides, CompressStreamDB saves 66.8% space.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Chinese Lexical Simplification

引用

IEEE/ACM Transactions on Audio Speech and Language Processing 2021年 29卷 1819-1828页

作者： Qiang, Jipeng Lu, Xinyu Li, Yun Yuan, Yunhao Wu, Xindong Department of Computer Science Yangzhou China Key Laboratory of Knowledge Engineering with Big Data Ministry of Education Hefei University of Technology Hefei China

Lexical simplification has attracted much attention in many languages, which is the process of replacing complex words in a given sentence with simpler alternatives of equivalent meaning. Although the richness of vocabulary in Chinese makes the text very difficult to read for children and non-native speakers, there is no research work for the Chinese lexical simplification (CLS) task. To circumvent difficulties in acquiring annotations, we manually create the first benchmark dataset for CLS, which can be used for evaluating the lexical simplification systems automatically. To acquire a more thorough comparison, we present five different types of methods as baselines to generate substitute candidates for the complex word that includes synonym-based approach, word embedding-based approach, BERT-based approach, sememe-based approach, and a hybrid approach. Finally, we design the experimental evaluation of these baselines and discuss their advantages and disadvantages. To our best knowledge, this is the first study for CLS task. © 2014 IEEE.

关键词： BERT Lexical simplification pretrained language model unsupervised

来源：评论

学校读者我要写书评

暂无评论

Intelligent Fast Cell Association Scheme Based on Deep Q-Learning in Ultra-Dense Cellular Networks

引用

China Communications 2021年第2期18卷 259-270页

作者： Jinhua Pan Lusheng Wang Hai Lin Zhiheng Zha Caihong Kai Key Laboratory of Knowledge Engineering with Big Data Ministry of Education.School of Computer Science and Information EngineeringHefei University of TechnologyHefei 230601China Anhui Province Key Laboratory of Industry Safety and Emergency Technology Hefei 230601China Key Laboratory of Aerospace Information Security and Trusted Computing Ministry of Education.School of Cyber Science and EngineeringWuhan UniversityWuhan 430072China

To support dramatically increased traffic loads,communication networks become *** cell association(CA)schemes are timeconsuming,forcing researchers to seek fast *** paper proposes a deep Q-learning based scheme,whose main idea is to train a deep neural network(DNN)to calculate the Q values of all the state-action pairs and the cell holding the maximum Q value is *** the training stage,the intelligent agent continuously generates samples through the trial-anderror method to train the DNN until *** the application stage,state vectors of all the users are inputted to the trained DNN to quickly obtain a satisfied CA result of a scenario with the same BS locations and user *** demonstrate that the proposed scheme provides satisfied CA results in a computational time several orders of magnitudes shorter than traditional ***,performance metrics,such as capacity and fairness,can be guaranteed.

关键词： ultra-dense cellular networks(UDCN) cell association(CA) deep Q-learning proportional fairness Q-learning

来源：评论

学校读者我要写书评

暂无评论

knowledge Graph based Mutual Attention for Machine Reading Comprehension over Anti-Terrorism Corpus

引用

data Intelligence 2023年第3期5卷 685-706页

作者： Feng Gao Jin Hou Jinguang Gu Lihua Zhang School of Computer Science and Technology Wuhan University of Science and TechnologyWuhan 430065Hubei The Key Laboratory of Rich-Media Knowledge Organization and Service of Digital Publishing Content Insitute of Scientic and Technical Information of ChinaBeijing 100038China Wuhan University of Science and Technology Big Data Science and Engineering Research Institute Wuhan 430065Hubei Eastchina Jiaotong University Nanchang 330013Jiangxi

Machine reading comprehension has been a research focus in natural language processing and intelligence ***,there is a lack of models and datasets for the MRC tasks in the anti-terrorism ***,current research lacks the ability to embed accurate background knowledge and provide precise *** address these two problems,this paper first builds a text corpus and testbed that focuses on the anti-terrorism domain in a semi-automatic ***,it proposes a knowledge-based machine reading comprehension model that fuses domain-related triples from a large-scale encyclopedic knowledge base to enhance the semantics of the *** eliminate knowledge noise that could lead to semantic deviation,this paper uses a mixed mutual ttention mechanism among questions,passages,and knowledge triples to select the most relevant triples before embedding their semantics into the *** results indicate that the proposed approach can achieve a 70.70%EM value and an 87.91%F1 score,with a 4.23%and 3.35%improvement over existing methods,respectively.

关键词： Machine reading comprehension Anti-terrorism domain knowledge embedding knowledge attention Mutual attention

来源：评论

学校读者我要写书评

暂无评论

Optimizing Random Access to Hierarchically-Compressed data on GPU

Optimizing Random Access to Hierarchically-Compressed Data o...

引用

Supercomputing Conference

作者： Feng Zhang Yihua Hu Haipeng Ding Zhiming Yao Zhewei Wei Xiao Zhang Xiaoyong Du Key Laboratory of Data Engineering and Knowledge Engineering (MOE) School of Information Renmin University of China Beijing China

ISBN: (纸本)9781665454452

GPU's powerful computational capacity holds great potentials for processing hierarchically-compressed data without decompression in data science domain. Unfortunately, existing GPU approaches offer only traversal-based data analytics; random access is extremely inefficient, substantially limiting their utility. To solve this problem, we develop a novel and broadly applicable optimization that enables efficient random access to hierarchically-compressed data without decompression in GPU memory. We address three major challenges for enabling efficient random access to compressed data on GPUs. The first challenge is designing GPU data structures that support random access. The second challenge is efficiently generating data structures on GPU. Generating data structures for random access is costly on the CPU, and the inefficiency increases dramatically when PCIe data transmission is incorporated. The third challenge is query processing on compressed data in GPU memory. Random accesses, including data updates, result in significant conflicts between massive threads. To solve the first challenge, we propose and modify a number of compressed data structures, including indexing within the complicated GPU memory hierarchy. To address the second challenge, we develop a two-phase process for generating these data structures on the GPU. To handle the third challenge, we propose a double-parsing design to avoid data conflicts. We evaluate our solution on two GPU platforms using five real-world datasets. Experiments show that the random access operations on GPU can achieve 65.04x average speedup compared to the state-of-the-art method.

关键词： Limiting Query processing Instruction sets High performance computing Memory management Graphics processing units data science

来源：评论

学校读者我要写书评

暂无评论

A knowledge Graph Completion Model with Path-Specific Learning

A Knowledge Graph Completion Model with Path-Specific Learni...

引用

Medical Artificial Intelligence (MedAI), IEEE International Conference on

作者： Xiulin Zheng Zan Zhang Rui Li Xindong Wu Key Laboratory of Knowledge Engineering with Big Data (the Ministry of Education of China) and also with the School of Computer Science and Information Engineering Hefei University of Technology Hefei Anhui China Inspur Academy of Science and Technology Shandong China

ISBN: (数字)9798350377613

ISBN: (纸本)9798350377620

knowledge graphs (KGs) play an increasingly im-portant role in many knowledge-aware tasks. However, existing KGs are struggle with incompleteness, which motivates knowledge graph completion (KGC), that is, predicting the lost links between entities based on observed triples. Reasoning over relation paths in incomplete KGs is popular. Nonetheless, some significant issues are still remained to be addressed, such as path noise and ambiguity of inferred relation. To address these problems, we propose a novel path augmented _Reasoning model with avoidance of Path noise and Disambiguation of inferred relation in this paper, referred to as RPD. In this model, we calculate the sum of resource allocation for each relation path to measure its reliability to avoid the inference of path noise. To address the ambiguity of an inferred relation, we introduce position embedding to denote the relation position along the path when learning path representation. Extensive experiments conducted on benchmark datasets demonstrate the effectiveness of our proposal RPD model in the handling of KGC tasks compared to SOTAs.

关键词： Heavily-tailed distribution Noise knowledge graphs Benchmark testing Transformers Cognition Reliability Resource management Proposals Noise measurement

来源：评论

学校读者我要写书评

暂无评论

Semi-supervised Multi-Label Learning with Missing Labels via Correlation Information

Semi-supervised Multi-Label Learning with Missing Labels via...

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Zexian Xie Peipei Li Jinling Jiang Xindong Wu Key Laboratory of Knowledge Engineering with Big Data (the Ministry of Education of China) Hefei University of Technology School of Computer Science and Information Engineering Hefei University of Technology Hefei Anhui China Knowledge Engineering Research Center Zhejiang Lab Hangzhou Zhejiang China

In multi-label learning, each instance is associated with a set of labels simultaneously. Most existing studies assume that the set of labels for each instance is complete. However, it is generally difficult to obtain all the relevant labels of each instance, and only a partial or even empty set of relevant labels is available, which is called semi-supervised multi-label learning with missing labels. To tackle this problem, we propose a novel framework that considers label correlations and instance correlations to recover the missing labels and utilizes a large amount of unlabeled data simultaneously to improve the classification performance. Specifically, a new supplementary label matrix is firstly obtained by learning the label correlation. Secondly, considering each class label may be decided by some specific characteristics of its own, a label-specific data representation is hence learned for each class label. Thirdly, instance correlations are utilized not only to recover the missing labels, but also to propagate the supervision information from labeled instances to unlabeled ones. In addition, a united objective function is designed to facilitate the above processing and an accelerated proximal gradient method is adopted to solve the optimization problem. Finally, extensive experimental results conducted on several benchmark datasets demonstrate the effectiveness of the proposed method compared to competing ones.

关键词：

来源：评论

学校读者我要写书评

暂无评论

data science:Trends, perspectives, and prospects

引用

data Science and Informetrics 2022年第3期2卷 1-21页

作者： Chaolemen Borjigin Chen Zhang Key Laboratory of Data Engineering and Knowledge Engineering Renmin University of ChinaBeijingChina Information Resource Management School Renmin University of ChinaBeijingChina

data science is a rapidly growing academic field with significant implications for all conventional scientific studies. However, most relevant studies have been limited to one or several facets of data science from a specific application domain perspective and less to discuss its theoretical framework. data science is unique in that its research goals, perspectives, and body of knowledge are distinct from other sciences. The core theories of data science are the DIKW pyramid, data-intensive scientific discovery, data science life cycle, data wrangling or munging,big data analytics, data management, and governance, data products Dev Ops, and big data visualization. Six main trends characterize the recent theoretical studies on data science are:(1)the growing significance of data Ops,(2) the rise of citizen data scientists,(3) enabling augmented data science,(4) integrating data warehouse with data lake,(5) diversity of domain-specific data science, and(6) implementing data stories as data products. Further development of data science should prioritize four ways to turn challenges into opportunities:(1) accelerating theoretical studies of data science,(2) the trade-off between explainability and performance,(3) achieving data ethics, privacy and trust, and(4) aligning academic curricula with industrial needs.

关键词： CCS concepts General and reference Surveys and overviews data science Big data data products data-driven management The DIKW pyramid

来源：评论

学校读者我要写书评

暂无评论

Discovering Reliable Information Extraction Patterns with Pre-Trained Model for Text with Writing Style

Discovering Reliable Information Extraction Patterns with Pr...

引用

IEEE International Conference on Systems, Man and Cybernetics

作者： Chenyang Bu Jiacheng Liu Jiaxuan Liu Shengwei Ji Hongbin Yang Ministry of Education Key Laboratory of Knowledge Engineering with Big Data School of Computer Science and Information Engineering Hefei University of Technology Hefei Anhui China School of Artificial Intelligence and Big Data Hefei University China

Large-scale pre-trained models such as GPT and BERT have demonstrated remarkable performance in information extraction tasks. However, their black-box nature poses challenges for reliability and interpretability. In contrast, rule- based extraction methods have better interpretability, but typically require domain experts to manually establish rules, limiting their generalization ability. In industry, there is often a demand for reliable knowledge extraction to reduce the time spent on manual verification of each piece of knowledge. In this paper, we explore the idea of combining GPT and symbolic-based methods to automatically discover reliable extraction patterns in text with a particular writing style. This method leverages the characteristics of high information density and similar writing patterns in text with a specific writing style to generate verifiable and reliable patterns. We conduct experiments on two datasets with a specific writing style to demonstrate its effectiveness, validating the idea of combining large models for reliable information extraction pattern discovery in the tested datasets.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Identifying Scientific and Technical“Unicorns”

引用

Journal of data and Information Science 2021年第2期6卷 96-115页

作者： Lucy L.Xu Miao Qi Fred Y.Ye Jiangsu Key Laboratory of Data Engineering and Knowledge Service School of Information ManagementNanjing UniversityNanjing 210023China University Library Nantong UniversityNantong 226019China

Purpose:Using the metaphor of"unicorn,"we identify the scientific papers and technical patents characterized by the informetric feature of very high citations in the first ten years after publishing,which may provide a new pattern to understand very high impact works in science and ***/methodology/approach:When we set CT as the total citations of papers or patents in the first ten years after publication,with CT≥5,000 for scientific"unicorn"and CT≥500 for technical"unicorn,"we have an absolute standard for identifying scientific and technical"unicorn"***:We identify 165 scientific"unicorns"in 14,301,875 WoS papers and 224 technical"unicorns"in 13,728,950 DII patents during 2001–*** 50%of"unicorns"belong to biomedicine,in which selected cases are individually *** rare"unicorns"increase following linear model,the fitting data show 95%confidence with the RMSE of scientific"unicorn"is 0.2127 while the RMSE of technical"unicorn"is *** limitations:A"unicorn"is a pure quantitative consideration without concerning its quality,and"potential unicorns"as CT≤5,000 for papers and CT≤500 for patents are left in future *** implications:Scientific and technical"unicorns"provide a new pattern to understand high-impact works in science and ***"unicorn"pattern supplies a concise approach to identify very high-impact scientific papers and technical ***/value:The"unicorn"pattern supplies a concise approach to identify very high impact scientific papers and technical patents.

关键词： Unicorn Scientific paper Technical patent Citation analysis Patent analysis

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：