检索结果-内蒙古大学图书馆

arXiv 2018年

作者： Mi, Yunlong Shi, Yong Li, Jinhai School of Computer and Control Engineering University of Chinese Academy of Sciences Beijing101408 China Key Laboratory of Big Data Mining and Knowledge Management Chinese Academy of Sciences Beijing100190 China Research Center on Fictitious Economy and Data Science Chinese Academy of Sciences Beijing100190 China College of Information Science and Technology University of Nebraska at Omaha NE68182 United States Faculty of Science Kunming University of Science and Technology Kunming650500 China

Concept-cognitive learning (CCL) is a hot topic in recent years, and it has attracted much attention from the communities of formal concept analysis, granular computing and cognitive computing. However, the relationship among cognitive computing (CC), concept-cognitive computing (CCC), CCL and concept-cognitive learning model (CCLM) is not clearly described. To this end, we first explain the relationship of CC, CCC, CCL and CCLM. Then, we propose a generalized concept-cognitive learning (GCCL) from the point of view of machine learning. Finally, experiments on some data sets are conducted to verify the feasibility of concept formation and concept-cognitive process of GCCL. Copyright © 2018, The Authors. All rights reserved.

关键词： Machine learning

来源：评论

学校读者我要写书评

暂无评论

A novel distance matric: Generalized relative entropy

arXiv

引用

arXiv 2017年

作者： Liu, Shuai Lu, Mengye Liu, Gaocheng Pan, Zheng College of Computer Science Inner Mongolia University Hohhot China Inner Mongolia Key Laboratory of Data Mining and Knowledge Engineering Hohhot China

Information entropy and its extension, which are important generalization of entropy, have been applied in many research domains today. In this paper, a novel generalized relative entropy is constructed to avoid some defects of traditional relative entropy. We presented the structure of generalized relative entropy after the discussion of defects in relative entropy. Moreover, some properties of the provided generalized relative entropy is presented and proved. The provided generalized relative entropy is proved to have a finite range and is a finite distance metric. Copyright © 2017, The Authors. All rights reserved.

关键词： Defects

来源：评论

学校读者我要写书评

暂无评论

Linked data Crowdsourcing Quality Assessment based on Domain Professionalism

引用

Journal of Physics: Conference Series 2019年第5期1187卷

作者： Lu Yang Li Huang Zhenzhen Liu School of Computer Science and Technology Wuhan University of Science and Technology Wuhan 430065 China Key Laboratory of Intelligent Information Processing and Real-time Industrial System in Hubei Province Wuhan 430065 China Institute of Big Data Science and Engineering Wuhan University of Science and Technology Wuhan 430065 China Key Laboratory of Rich-media Knowledge Organization and Service of Digital Publishing Content National Press and Publication Administration Beijing 100038 China

With the rapid development of Internet technology, crowdsourcing, as a flexible, effective and low-cost problem-solving method, has begun to receive more and more attention. The use of crowdsourcing to evaluate the quality of linked data has also become a research hotspot. This paper proposes the concept of Domain Specialization Test (DST), which uses domain professional testing tasks DSTs to evaluate the professionalism of workers, and combines the idea of Mini-batch Gradient Descent (MBGD) to improve the EM algorithm, and the MBEM algorithm is proposed to achieve efficient and accurate evaluation of task results. The experimental results show that the proposed method can screen out the appropriate workers for the linked data crowdsourcing task and improve the accuracy and iteration efficiency of the results.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Emergency Event Matching using Hierarchical Blocking Method

引用

Journal of Physics: Conference Series 2019年第5期1187卷

作者： Chang Wen Yu Liu College of Computer Science and Technology Wuhan University of Science and Technology Wuhan 430065 China Key Laboratory of Intelligent Information Processing and Real-time Industrial System in Hubei Province Wuhan 430065 China Institute of Big Data Science and Engineering Wuhan University of Science and Technology Wuhan 430065 China Key Laboratory of Rich-media Knowledge Organization and Service of Digital Publishing Content National Press and Publication Administration Beijing 100038 China

With the extensive application of the knowledge base (KB), how to complete it is a hot topic on Semantic Web. However, many problems go with the big data, and the event matching is one of these problems, which is finding out the entities referring to the same things in the real world and also the key point in the extending process. To enrich the emergency knowledge base (E-SKB) we constructed before, we need to filter out the news from several web pages and find the same news to avoid data redundancy. In this paper, we proposed a hierarchy blocking method to reduce the times of comparisons and narrow down the scope by extracting the news properties as the blocking keys. The method transforms the event matching problem into a clustering problem. Experimental results show that the proposed method is superior to the existing text clustering algorithm with high precision and less comparison times.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Weighted neural bag-of-n-grams model: New baselines for text classification 26

Weighted neural bag-of-n-grams model: New baselines for text...

引用

26th International Conference on Computational Linguistics, COLING 2016

作者： Li, Bofang Zhao, Zhe Liu, Tao Wang, Puwei Du, Xiaoyong School of Information Renmin University of China Key laboratory of Data Engineering and Knowledge Engineering MOE China

ISBN: (纸本)9784879747020

NBSVM is one of the most popular methods for text classification and has been widely used as baselines for various text representation approaches. It uses Naive Bayes (NB) feature to weight sparse bag-of-n-grams representation. N-gram captures word order in short context and NB feature assigns more weights to those important words. However, NBSVM suffers from sparsity problem and is reported to be exceeded by newly proposed distributed (dense) text representations learned by neural networks. In this paper, we transfer the n-grams and NB weighting to neural models. We train n-gram embeddings and use NB weighting to guide the neural models to focus on important words. In fact, our methods can be viewed as distributed (dense) counterparts of sparse bag-of-n-grams in NBSVM. We discover that n-grams and NB weighting are also effective in distributed representations. As a result, our models achieve new strong baselines on 9 text classification datasets, e.g. on IMDB dataset, we reach performance of 93.5% accuracy, which exceeds previous state-of-the-art results obtained by deep neural models. All source codes are publicly available at https://***/zhezhaoa/neural-BOW-toolkit. © 1963-2018 ACL.

关键词： Classification (of information)

来源：评论

学校读者我要写书评

暂无评论

How China Deals with Big data

引用

Annals of data Science 2017年第4期4卷 433-440页

作者： Shi, Yong Shan, Zhiguang Li, Jianping Fang, Yufei School of Economics and Management University of Chinese Academy of Sciences Beijing100190 China Key Laboratory of Big Data Mining and Knowledge Management Chinese Academy of Sciences Beijing China Informatization Research Department State Information Center Beijing100045 China Institute of Policy and Management Chinese Academy of Sciences Beijing100190 China School of Computer and Communication Engineering University of Science and Technology Beijing100083 China Informatization Research Department State Information Center Beijing100045 China

On September 5, 2015, the State Council of Chinese Government, China’s cabinet formally announced its Action Framework for Promoting Big data (***, 2015). This is the milestone for China to catch up the global wave of big data. Since 2012 big data became a hot issue for scientific communities as well as the governments of many countries (Lazer et al. in Science 343:1203–1205, 2014;Einav et al. in Science 345:715, 2014;Cate in Science 346:818, 2014;Khoury and Ioannidis in Science 346:1054–1055, 2014). At the 2013 G8 Summit, the leaders of Canada, France, Germany, Italy, Japan, Russia, U.S.A. and United Kingdom agreed on an "open government plan" (***/government/publications/open-data-charter/g8-open-data-charter-and-technical-annex, 2013). China’s framework, however, mainly emphasizes the integration of all trans-departmental data and establishes a number of government-driven national big data platforms so as to provide big data services to research, public and enterprises. The framework not only demonstrates a strong commitment of the Chinese government on big data, but also covers a wide range of governmental branches, enterprises and institutions far more than that of other countries. In addition, the framework shows an interpretation of big data that differs from other countries. If its objective is achieved, China would become a strong "big data country". © 2017, Springer-Verlag GmbH Germany, part of Springer Nature.

关键词： Big data

来源：评论

学校读者我要写书评

暂无评论

Utility-based anonymisation for dataset with multiple sensitive attributes

Utility-based anonymisation for dataset with multiple sensit...

引用

作者： Wang, Lixia Zhu, Qing Key Laboratory for Data Engineering and Knowledge Engineering MOE School of Information Renmin University of China Beijing100872 China

Privacy-preserving data publication problem has attracted more and more attentions in recent years. A lot of related research works have been done towards dataset with single sensitive attribute. However, usually, original dataset contains more than one sensitive attribute. In this paper, we apply k-anonymity principle to solve the data publication problem for dataset with multiple sensitive attributes. We first cluster sensitive values based on a utility matrix. Then, we use a greedy strategy to partition tuples into equivalence classes. Our method can guarantee that the size of equivalence class is k except the last one, which reduces information loss. Also, we can guarantee the diversity of sensitive value in an equivalence class, which can protect privacy against the homogeneity attack. Experiments on a real dataset show that our method performs well on information loss, which indicates that we can guarantee data utility while protecting personal privacy. Copyright © 2016 Inderscience Enterprises Ltd.

关键词： Equivalence classes

来源：评论

学校读者我要写书评

暂无评论

Accuracy estimation of link-based similarity measures and its application

引用

Frontiers of Computer Science 2016年第1期10卷 113-123页

作者： Yinglong ZHANG Cuiping LI Chengwang XIE Hong CHEN School of Software East China Jiaotong University Nanchang 330045 China Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education and Department of Computer Science Renmin University of China Beijing 100872 China Intelligent Optimization and Information Processing Laboratory East China Jiaotong University Nanchang 330013 China

Link-based similarity measures play a significant role in many graph based applications. Consequently, mea- suring node similarity in a graph is a fundamental problem of graph data mining. Personalized PageRank （PPR） and Sim- Rank （SR） have emerged as the most popular and influen- tial link-based similarity measures. Recently, a novel link- based similarity measure, penetrating rank （P-Rank）, which enriches SR, was proposed. In practice, PPR, SR and P-Rank scores are calculated by iterative methods. As the number of iterations increases so does the overhead of the calcula- tion. The ideal solution is that computing similarity within the minimum number of iterations is sufficient to guaran- tee a desired accuracy. However, the existing upper bounds are too coarse to be useful in general. Therefore, we focus on designing an accurate and tight upper bounds for PPR, SR, and P-Rank in the paper. Our upper bounds are designed based on the following intuition： the smaller the difference between the two consecutive iteration steps is, the smaller the difference between the theoretical and iterative similar- ity scores becomes. Furthermore, we demonstrate the effec- tiveness of our upper bounds in the scenario of top-k similar nodes queries, where our upper bounds helps accelerate the speed of the query. We also run a comprehensive set of exper- iments on real world data sets to verify the effectiveness and efficiency of our upper bounds.

关键词： personalized PageRank SimRank P-Rank up-per bound

来源：评论

学校读者我要写书评

暂无评论

Examining scientific writing styles from the perspective of linguistic complexity

arXiv

引用

arXiv 2018年

作者： Lu, Chao Bu, Yi Wang, Jie Ding, Ying Torvik, Vetle Schnaars, Matthew Zhang, Chengzhi School of Economics and Management Nanjing University of Science and Technology Nanjing Jiangsu China School of Informatics Computing and Engineering Indiana University BloomingtonIN United States Center for Complex Networks and Systems Research School of Informatics Computing and Engineering Indiana University BloomingtonIN United States School of Information Management Nanjing University Nanjing Jiangsu China School of Information Management Wuhan University Wuhan Hubei China School of Information Sciences University of Illinois UrbanaIL United States Jiangsu Key Laboratory of Data Engineering and Knowledge Service Nanjing University Nanjing Jiangsu China

Publishing articles in high-impact English journals is difficult for scholars around the world, especially for non-native English-speaking scholars (NNESs), most of whom struggle with proficiency in English. In order to uncover the differences in English scientific writing between native English-speaking scholars (NESs) and NNESs, we collected a large-scale data set containing more than 150,000 full-text articles published in PLoS between 2006 and 2015. We divided these articles into three groups according to the ethnic backgrounds of the first and corresponding authors, obtained by Ethnea, and examined the scientific writing styles in English from a two-fold perspective of linguistic complexity: (1) syntactic complexity, including measurements of sentence length and sentence complexity;and (2) lexical complexity, including measurements of lexical diversity, lexical density, and lexical sophistication. The observations suggest marginal differences between groups in syntactical and lexical complexity. Copyright © 2018, The Authors. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

The Tourism-Specific Sentiment Vector Construction Based on Kernel Optimization Function

引用

Procedia Computer Science 2017年 122卷 1162-1167页

作者： Luyao Zhu Wei Li Kun Guo Yong Shi Yuanchun Zheng School of Economics and Management University of Chinese Academy of Sciences Beijing China Fictitious Economy & Data Science Research Center Chinese Academy of Sciences Beijing China Key Laboratory of Big Data Mining and Knowledge Management Chinese Academy of Sciences School of Computer and Control Engineering University of Chinese Academy of Sciences Beijing China

Sentiment analysis in tourism domain has drawn much attention in past few years, which calls for more precise sentiment word embedding method. The article proposes a kernel optimization function for sentiment word embedding. And the method aims at integrating the semantic information, statistics information and sentiment information and maintains the similarity between sentiment words in terms of sentiment orientation. The experiment result shows that the optimal sentiment vectors successfully extract the features in terms of sentiment information and the difference between concretization and abstraction of a sentiment words.

关键词： kernel function sentiment vector word embedding sentiment analysis

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：