检索结果-内蒙古大学图书馆

arXiv 2017年

作者： Dong, Jianfeng Li, Xirong Xu, Duanqing College of Computer Science and Technology Zhejiang University Hangzhou310027 China Key Lab of Data Engineering and Knowledge Engineering School of Information Renmin University of China Beijing100872 China

—In order to retrieve unlabeled images by textual queries, cross-media similarity computation is a key ingredient. Although novel methods are continuously introduced, little has been done to evaluate these methods together with large-scale query log analysis. Consequently, how far have these methods brought us in answering real-user queries is unclear. Given baseline methods that use relatively simple text/image matching, how much progress have advanced models made is also unclear. This paper takes a pragmatic approach to answering the two questions. Queries are automatically categorized according to the proposed query visualness measure, and later connected to the evaluation of multiple cross-media similarity models on three test sets. Such a connection reveals that the success of the state-of-the-art is mainly attributed to their good performance on visual-oriented queries, which account for only a small part of real-user queries. To quantify the current progress, we propose a simple text2image method, representing a novel query by a set of images selected from large-scale query log. Consequently, computing cross-media similarity between the query and a given image boils down to comparing the visual similarity between the given image and the selected images. Image retrieval experiments on the challenging Clickture dataset show that the proposed text2image is a strong baseline, comparing favorably to recent deep learning alternatives. Copyright © 2017, The Authors. All rights reserved.

关键词： Image retrieval

来源：评论

学校读者我要写书评

暂无评论

Examining scientific writing styles from the perspective of linguistic complexity

arXiv

引用

arXiv 2018年

作者： Lu, Chao Bu, Yi Wang, Jie Ding, Ying Torvik, Vetle Schnaars, Matthew Zhang, Chengzhi School of Economics and Management Nanjing University of Science and Technology Nanjing Jiangsu China School of Informatics Computing and Engineering Indiana University BloomingtonIN United States Center for Complex Networks and Systems Research School of Informatics Computing and Engineering Indiana University BloomingtonIN United States School of Information Management Nanjing University Nanjing Jiangsu China School of Information Management Wuhan University Wuhan Hubei China School of Information Sciences University of Illinois UrbanaIL United States Jiangsu Key Laboratory of Data Engineering and Knowledge Service Nanjing University Nanjing Jiangsu China

Publishing articles in high-impact English journals is difficult for scholars around the world, especially for non-native English-speaking scholars (NNESs), most of whom struggle with proficiency in English. In order to uncover the differences in English scientific writing between native English-speaking scholars (NESs) and NNESs, we collected a large-scale data set containing more than 150,000 full-text articles published in PLoS between 2006 and 2015. We divided these articles into three groups according to the ethnic backgrounds of the first and corresponding authors, obtained by Ethnea, and examined the scientific writing styles in English from a two-fold perspective of linguistic complexity: (1) syntactic complexity, including measurements of sentence length and sentence complexity;and (2) lexical complexity, including measurements of lexical diversity, lexical density, and lexical sophistication. The observations suggest marginal differences between groups in syntactical and lexical complexity. Copyright © 2018, The Authors. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

A novel distance matric: Generalized relative entropy

arXiv

引用

arXiv 2017年

作者： Liu, Shuai Lu, Mengye Liu, Gaocheng Pan, Zheng College of Computer Science Inner Mongolia University Hohhot China Inner Mongolia Key Laboratory of Data Mining and Knowledge Engineering Hohhot China

Information entropy and its extension, which are important generalization of entropy, have been applied in many research domains today. In this paper, a novel generalized relative entropy is constructed to avoid some defects of traditional relative entropy. We presented the structure of generalized relative entropy after the discussion of defects in relative entropy. Moreover, some properties of the provided generalized relative entropy is presented and proved. The provided generalized relative entropy is proved to have a finite range and is a finite distance metric. Copyright © 2017, The Authors. All rights reserved.

关键词： Defects

来源：评论

学校读者我要写书评

暂无评论

Utility-based anonymisation for dataset with multiple sensitive attributes

Utility-based anonymisation for dataset with multiple sensit...

引用

作者： Wang, Lixia Zhu, Qing Key Laboratory for Data Engineering and Knowledge Engineering MOE School of Information Renmin University of China Beijing100872 China

Privacy-preserving data publication problem has attracted more and more attentions in recent years. A lot of related research works have been done towards dataset with single sensitive attribute. However, usually, original dataset contains more than one sensitive attribute. In this paper, we apply k-anonymity principle to solve the data publication problem for dataset with multiple sensitive attributes. We first cluster sensitive values based on a utility matrix. Then, we use a greedy strategy to partition tuples into equivalence classes. Our method can guarantee that the size of equivalence class is k except the last one, which reduces information loss. Also, we can guarantee the diversity of sensitive value in an equivalence class, which can protect privacy against the homogeneity attack. Experiments on a real dataset show that our method performs well on information loss, which indicates that we can guarantee data utility while protecting personal privacy. Copyright © 2016 Inderscience Enterprises Ltd.

关键词： Equivalence classes

来源：评论

学校读者我要写书评

暂无评论

How China Deals with Big data

引用

Annals of data Science 2017年第4期4卷 433-440页

作者： Shi, Yong Shan, Zhiguang Li, Jianping Fang, Yufei School of Economics and Management University of Chinese Academy of Sciences Beijing100190 China Key Laboratory of Big Data Mining and Knowledge Management Chinese Academy of Sciences Beijing China Informatization Research Department State Information Center Beijing100045 China Institute of Policy and Management Chinese Academy of Sciences Beijing100190 China School of Computer and Communication Engineering University of Science and Technology Beijing100083 China Informatization Research Department State Information Center Beijing100045 China

On September 5, 2015, the State Council of Chinese Government, China’s cabinet formally announced its Action Framework for Promoting Big data (***, 2015). This is the milestone for China to catch up the global wave of big data. Since 2012 big data became a hot issue for scientific communities as well as the governments of many countries (Lazer et al. in Science 343:1203–1205, 2014;Einav et al. in Science 345:715, 2014;Cate in Science 346:818, 2014;Khoury and Ioannidis in Science 346:1054–1055, 2014). At the 2013 G8 Summit, the leaders of Canada, France, Germany, Italy, Japan, Russia, U.S.A. and United Kingdom agreed on an "open government plan" (***/government/publications/open-data-charter/g8-open-data-charter-and-technical-annex, 2013). China’s framework, however, mainly emphasizes the integration of all trans-departmental data and establishes a number of government-driven national big data platforms so as to provide big data services to research, public and enterprises. The framework not only demonstrates a strong commitment of the Chinese government on big data, but also covers a wide range of governmental branches, enterprises and institutions far more than that of other countries. In addition, the framework shows an interpretation of big data that differs from other countries. If its objective is achieved, China would become a strong "big data country". © 2017, Springer-Verlag GmbH Germany, part of Springer Nature.

关键词： Big data

来源：评论

学校读者我要写书评

暂无评论

Linked data Crowdsourcing Quality Assessment based on Domain Professionalism

引用

Journal of Physics: Conference Series 2019年第5期1187卷

作者： Lu Yang Li Huang Zhenzhen Liu School of Computer Science and Technology Wuhan University of Science and Technology Wuhan 430065 China Key Laboratory of Intelligent Information Processing and Real-time Industrial System in Hubei Province Wuhan 430065 China Institute of Big Data Science and Engineering Wuhan University of Science and Technology Wuhan 430065 China Key Laboratory of Rich-media Knowledge Organization and Service of Digital Publishing Content National Press and Publication Administration Beijing 100038 China

With the rapid development of Internet technology, crowdsourcing, as a flexible, effective and low-cost problem-solving method, has begun to receive more and more attention. The use of crowdsourcing to evaluate the quality of linked data has also become a research hotspot. This paper proposes the concept of Domain Specialization Test (DST), which uses domain professional testing tasks DSTs to evaluate the professionalism of workers, and combines the idea of Mini-batch Gradient Descent (MBGD) to improve the EM algorithm, and the MBEM algorithm is proposed to achieve efficient and accurate evaluation of task results. The experimental results show that the proposed method can screen out the appropriate workers for the linked data crowdsourcing task and improve the accuracy and iteration efficiency of the results.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Emergency Event Matching using Hierarchical Blocking Method

引用

Journal of Physics: Conference Series 2019年第5期1187卷

作者： Chang Wen Yu Liu College of Computer Science and Technology Wuhan University of Science and Technology Wuhan 430065 China Key Laboratory of Intelligent Information Processing and Real-time Industrial System in Hubei Province Wuhan 430065 China Institute of Big Data Science and Engineering Wuhan University of Science and Technology Wuhan 430065 China Key Laboratory of Rich-media Knowledge Organization and Service of Digital Publishing Content National Press and Publication Administration Beijing 100038 China

With the extensive application of the knowledge base (KB), how to complete it is a hot topic on Semantic Web. However, many problems go with the big data, and the event matching is one of these problems, which is finding out the entities referring to the same things in the real world and also the key point in the extending process. To enrich the emergency knowledge base (E-SKB) we constructed before, we need to filter out the news from several web pages and find the same news to avoid data redundancy. In this paper, we proposed a hierarchy blocking method to reduce the times of comparisons and narrow down the scope by extracting the news properties as the blocking keys. The method transforms the event matching problem into a clustering problem. Experimental results show that the proposed method is superior to the existing text clustering algorithm with high precision and less comparison times.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Accuracy estimation of link-based similarity measures and its application

引用

Frontiers of Computer Science 2016年第1期10卷 113-123页

作者： Yinglong ZHANG Cuiping LI Chengwang XIE Hong CHEN School of Software East China Jiaotong University Nanchang 330045 China Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education and Department of Computer Science Renmin University of China Beijing 100872 China Intelligent Optimization and Information Processing Laboratory East China Jiaotong University Nanchang 330013 China

Link-based similarity measures play a significant role in many graph based applications. Consequently, mea- suring node similarity in a graph is a fundamental problem of graph data mining. Personalized PageRank （PPR） and Sim- Rank （SR） have emerged as the most popular and influen- tial link-based similarity measures. Recently, a novel link- based similarity measure, penetrating rank （P-Rank）, which enriches SR, was proposed. In practice, PPR, SR and P-Rank scores are calculated by iterative methods. As the number of iterations increases so does the overhead of the calcula- tion. The ideal solution is that computing similarity within the minimum number of iterations is sufficient to guaran- tee a desired accuracy. However, the existing upper bounds are too coarse to be useful in general. Therefore, we focus on designing an accurate and tight upper bounds for PPR, SR, and P-Rank in the paper. Our upper bounds are designed based on the following intuition： the smaller the difference between the two consecutive iteration steps is, the smaller the difference between the theoretical and iterative similar- ity scores becomes. Furthermore, we demonstrate the effec- tiveness of our upper bounds in the scenario of top-k similar nodes queries, where our upper bounds helps accelerate the speed of the query. We also run a comprehensive set of exper- iments on real world data sets to verify the effectiveness and efficiency of our upper bounds.

关键词： personalized PageRank SimRank P-Rank up-per bound

来源：评论

学校读者我要写书评

暂无评论

Creating knowledge and Wisdom via Big data Analytics: Preface for ITQM 2017

引用

Procedia Computer Science 2017年 122卷 1-9页

作者： Ahuja, Vandana Shi, Yong Khazanchi, Deepak Abidi, Naseem Tian, Yingjie Berg, Daniel Tien, James M. Jaypee Business School A-10 Sector-62 Noida Uttar Pradesh201 307 India School of Economics and Management University of Chinese Academy of Sciences Key Lab of Big Data Mining and Knowledge Management Chinese Academy of Sciences Beijing100190 China College of Information Science and Technology University of Nebraska at Omaha Chinese Academy of Sciences OmahaNE68182 United States College of Engineering University of Miami Coral GablesFL33124 United States

来源：评论

学校读者我要写书评

暂无评论

Predicting visual features from text for image and video caption retrieval

arXiv

引用

arXiv 2017年

作者： Dong, Jianfeng Li, Xirong Snoek, Cees G.M. College of Computer Science and Technology Zhejiang University Hangzhou310027 China Key Lab of Data Engineering and Knowledge Engineering School of Information Renmin University of China Beijing100872 China Informatics Institute University of Amsterdam Amsterdam1098 XH Netherlands

This paper strives to find amidst a set of sentences the one best describing the content of a given image or video. Different from existing works, which rely on a joint subspace for their image and video caption retrieval, we propose to do so in a visual space exclusively. Apart from this conceptual novelty, we contribute Word2VisualVec, a deep neural network architecture that learns to predict a visual feature representation from textual input. Example captions are encoded into a textual embedding based on multi-scale sentence vectorization and further transferred into a deep visual feature of choice via a simple multi-layer perceptron. We further generalize Word2VisualVec for video caption retrieval, by predicting from text both 3-D convolutional neural network features as well as a visual-audio representation. Experiments on Flickr8k, Flickr30k, the Microsoft Video Description dataset and the very recent NIST TrecVid challenge for video caption retrieval detail Word2VisualVec's properties, its benefit over textual embeddings, the potential for multimodal query composition and its state-of-the-art results. Copyright © 2017, The Authors. All rights reserved.

关键词： Forecasting

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：