检索结果-内蒙古大学图书馆

Lecture Notes in Computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2014年 8801卷 95-106页

作者： Ma, Shutian Zhang, Chengzhi Department of Information Management Nanjing University of Science and Technology Nanjing210094 China Jiangsu Key Laboratory of Data Engineering and Knowledge Service Nanjing University Nanjing210093 China

As an important resource for machine translation and cross-language information retrieval, collecting large-scale parallel corpus has been paid wide attention. With the development of the Internet, researchers begin to mine the parallel corpora from the multilingual websites. They use some prior knowledge like ad hoc heuristics or calculate the similarity of the webpages structure and content to find the bilingual webpages. This paper presents a method that uses the search engine and little prior knowledge about the URL patterns to get the bilingual websites from the Internet. The method is fast for its low time cost and there is no need for large-scale computation on URL pattern matching. We have collected 88 915 candidate parallel Chinese-English webpages, which average accuracy is around 90.8%. During the evaluation, the true bilingual websites that we found have high similar html structure and good quality translations. © Springer International Publishing Switzerland 2014.

关键词： Websites

来源：评论

学校读者我要写书评

暂无评论

Image Segmentation via Improving Clustering Algorithms with Density and Distance

引用

Procedia Computer science 2015年 55卷 1015-1022页

作者： Zhensong Chen Zhiquan Qi Fan Meng Limeng Cui Yong Shi School of Management University of Chinese Academy of Sciences Beijing 100190 China Key Laboratory of Big Data Mining and Knowledge Management CAS Beijing 100190 China Sino-Danish Center for Education and Research Beijing 100190 China Research Center on Fictitious Economy and Data Science CAS Beijing 100190 China School of Computer and Control Engineering University of Chinese Academy of Sciences Beijing 100190 China

Image segmentation problem is a fundamental task and process in computer vision and image processing applications. It is well known that the performance of image segmentation is mainly influenced by two factors: the segmentation approaches and the feature presentation. As for image segmentation methods, clustering algorithm is one of the most popular approaches. However, most current clustering-based segmentation methods exist some problems, such as the number of regions of image have to be given prior, the different initial cluster centers will produce different segmentation results and so on. In this paper, we present a novel image segmentation approach based on DP clustering algorithm. Compared with the current methods, our method has several improved advantages as follows: 1) This algorithm could directly give the cluster number of the image based on the decision graph; 2) The cluster centers could be identified correctly; 3) We could simply achieve the hierarchical segmentation according to the applications requirement. A lot of experiments demonstrate the validity of this novel segmentation algorithm.

关键词： Image Segmentation Clustering Algorithm Feature Representation

来源：评论

学校读者我要写书评

暂无评论

Detecting incorrect numerical data in DBpedia

Detecting incorrect numerical data in DBpedia

引用

11th International Conference on Semantic Web: Trends and Challenges, ESWC 2014

作者： Wienand, Dominik Paulheim, Heiko Technische Universität Darmstadt Knowledge Engineering Group Germany University of Mannheim Research Group Data and Web Science Germany

ISBN: (纸本)9783319074429

DBpedia is a central hub of Linked Open data (LOD). Being based on crowd-sourced contents and heuristic extraction methods, it is not free of errors. In this paper, we study the application of unsupervised numerical outlier detection methods to DBpedia, using Interquantile Range (IQR), Kernel Density Estimation (KDE), and various dispersion estimators, combined with different semantic grouping methods. Our approach reaches 87% precision, and has lead to the identification of 11 systematic errors in the DBpedia extraction framework. © 2014 Springer International Publishing.

关键词： Numerical methods

来源：评论

学校读者我要写书评

暂无评论

Learning semantically coherent rules 1

Learning semantically coherent rules

引用

1st International Workshop on data Mining and Natural Language Processing, DMNLP 2014

作者： Gabriel, Alexander Paulheim, Heiko Janssen, Frederik Knowledge Engineering Group Technische Universität Darmstadt Germany Research Group Data and Web Science University of Mannheim Germany

The capability of building a model that can be understood and interpreted by humans is one of the main selling points of symbolic machine learning algorithms, such as rule or decision tree learners. However, those algorithms are most often optimized w.r.t. classification accuracy, but not the understandability of the resulting model. In this paper, we focus on a particular aspect of understandability, i.e., semantic coherence. We introduce a variant of a separate-and-conquer rule learning algorithm using a WordNet-based heuristic to learn rules that are semantically coherent. In an evaluation on different datasets, we show that the approach learns rules that are significantly more semantically coherent, without losing accuracy. Copyright © by the paper's authors.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

A fast algorithm to building a fuzzy rough classifier 13

A fast algorithm to building a fuzzy rough classifier

引用

13th International Conference on Machine Learning and Cybernetics, ICMLC 2014

作者： Tsang, Eric C. C. Zhao, Suyun The Macau University of Science and Technology Taipa Macau China Key Laboratory of Data Engineering and Knowledge Engineering Renmin University of China MOE Beijing100872 China

ISBN: (纸本)9783662456514

In this paper, by strict mathematic reasoning, we discover the relation between the similarity relation and lower approximation. Based on this relation, we design a fast algorithm to build a rule based fuzzy rough classifier. Finally, the numerical experiments demonstrate the efficiency and the affectivity of the proposed algorithm. © Springer-Verlag Berlin Heidelberg 2014.

关键词： Rough set theory

来源：评论

学校读者我要写书评

暂无评论

Personalized Financial News Recommendation Algorithm Based on Ontology

引用

Procedia Computer science 2015年 55卷 843-851页

作者： Rui Ren Lingling Zhang Limeng Cui Bo Deng Yong Shi School of Management University of Chinese Academy of Sciences Beijing 100190 China Key Laboratory of Big Data Mining and Knowledge Management Chinese Academy of Sciences Beijing 100190 China Research Centre on Fictitious Economy and Data Science Chinese Academy of Sciences Beijing 100190 China School of Computer and Control Engineering University of Chinese Academy of Sciences Beijing 100190 China Institute of Computing Technology Chinese Academy of Sciences Beijing 100190 China

To deal with the challenge of information overload, in this paper, we propose a financial news recommendation algorithm which help users find the articles that are interesting to read. To settle the ambiguity problem, a new presented OF-IDF method is employed to represent the unstructured text data in the form of key concepts, synonyms and synsets which are all stored in the domain ontology. For users, the recommendation algorithm build the profiles based on their behaviors to detect the genuine interests and predict current interests automatically and in real time by applying the thinking of relevance feedback. Finally, the experiment conducted on a financial news dataset demonstrates that the proposed algorithm significantly outperforms the performance of a traditional recommender.

关键词： news recommendation algorithm ontology relevence feedback OF-IDF

来源：评论

学校读者我要写书评

暂无评论

Few-example video event retrieval using tag propagation 14

Few-example video event retrieval using tag propagation

引用

2014 4th ACM International Conference on Multimedia Retrieval, ICMR 2014

作者： Mazloom, Masoud Li, Xirong Snoek, Cees G. M. ISLA Faculty of Science University of Amsterdam Science Park 904 1098 XH Amsterdam Netherlands Key Lab of Data Engineering and Knowledge Engineering Renmin University of China Beijing 100872 China

ISBN: (纸本)1595930361

An emerging topic in multimedia retrieval is to detect a complex event in video using only a handful of video examples. Different from existing work which learns a ranker from positive video examples and hundreds of negative examples, we aim to query web video for events using zero or only a few visual examples. To that end, we propose in this paper a tag-based video retrieval system which propagates tags from a tagged video source to an unlabeled video collection without the need of any training examples. Our algorithm is based on weighted frequency neighbor voting using concept vector similarity. Once tags are propagated to unlabeled video we can rely on off-the-shelf language models to rank these videos by the tag similarity. We study the behavior of our tag-based video event retrieval system by performing three experiments on web videos from the TRECVID multimedia event detection corpus, with zero, one and multiple query examples that beats a recent alternative. Copyright 2014 ACM.

关键词： Image retrieval

来源：评论

学校读者我要写书评

暂无评论

The linked data mining challenge 2014 results and experiences 3

The linked data mining challenge 2014 results and experience...

引用

3rd International Workshop on knowledge Discovery and data Mining Meets Linked Open data, Know@LOD 2014, Co-located with 11th Extended Semantic Web Conference, ESWC 2014

作者： Svátek, Vojtěch Mynarz, Jindřich Paulheim, Heiko University of Economics Department of Information and Knowledge Engineering Prague Czech Republic University of Mannheim Germany Research Group Data and Web Science Germany

The 2014 edition of the Linked data Mining Challenge, conducted in conjunction with Know@LOD 2014, has been the third edition of this challenge. The underlying data came from two domains: public procurement, and researcher collaboration. Like in the previous year, when the challenge was held at the data Mining on Linked data workshop co-located with the European Conference on Machine Learning and Principles and Practice of knowledge Discovery in databases (ECML PKDD 2013), the response to the challenge appeared lower than expected, with only one solution submitted for the predictive task this year. We have tried to track the reasons for the continuously low participation in the challenge via a questionnaire survey, and principles have been distilled that could help organizers of future similar challenges.

关键词： Linked data

来源：评论

学校读者我要写书评

暂无评论

A fast noise resilient anomaly detection using GMM-based collective labelling

A fast noise resilient anomaly detection using GMM-based col...

引用

science and Information Conference (SAI)

作者： Elnaz Bigdeli Bijan Raahemi Mahdi Mohammadi Stan Matwin School of Electrical Engineering and Computer Science University of Ottawa Ottawa ON Canada Telfer School of Management Knowledge Discovery and Data Mining lab University of Ottawa E. Ottawa ON Canada Department of Computing Dalhousie University Halifax NB Canada Polish Academy of Sciences Warsaw Poland

Anomaly detection algorithms face several challenges including computational complexity and resiliency to noise in input data. In this paper, we propose a fast and noise-resilient cluster-based anomaly detection method using collective labelling approach. In the proposed Collective Probabilistic Anomaly Detection method, first instead of labelling each new sample (as normal or anomaly) individually, the new samples are clustered, then labelled. This collective labelling mitigates the negative impact of noise by relying on group behaviour rather than individual characteristics of incoming samples. Second, since grouping and labelling new samples may be time-consuming, we summarize clusters using Gaussian Mixture Model (GMM). Not only does GMM offer faster processing speed; it also facilitates summarizing clusters with arbitrary shape, and consequently, reducing the memory space requirement. Finally, a modified distance measure, based on Kullback-Liebner method, is proposed to calculate the similarity among clusters represented by GMMs. We evaluate the proposed method on various datasets by measuring its false alarm rate, detection rate and memory requirement. We also add different levels of noise to the input datasets to demonstrate the performance of the proposed collective anomaly detection method in the presence of noise. The experimental results confirm superior performance of the proposed method compared to individually-based labelling techniques in terms of memory usage, detection rate and false alarm rate.

关键词： Noise Shape Training Gaussian mixture model Clustering algorithms Support vector machines

来源：评论

学校读者我要写书评

暂无评论

Identifying disputed topics in the news 1

Identifying disputed topics in the news

引用

1st Workshop on Linked data for knowledge Discovery, LD4KD 2014 - Co-located with European Conference on Machine Learning and Principles and Practice of knowledge Discovery in databases, ECMLPKDD 2014

作者： De Clercq, Orphee Hertling, Sven Hoste, Veronique Ponzetto, Simone Paolo Paulheim, Heiko LT3 Language and Translation Technology Team Ghent University Belgium Knowledge Engineering Group Technische Universität Darmstadt Germany Research Group Data and Web Science University of Mannheim Germany

News articles often reflect an opinion or point of view, with certain topics evoking more diverse opinions than others. For analyzing and better understanding public discourses, identifying such contested topics constitutes an interesting research question. In this paper, we describe an approach that combines NLP techniques and background knowledge from DBpedia for finding disputed topics in news sites. To identify these topics, we annotate each article with DBpedia concepts, extract their categories, and compute a sentiment score in order to identify those categories revealing significant deviations in polarity across different media. We illustrate our approach in a qualitative evaluation on a sample of six popular British and American news sites. Copyright © 2014 for the individual papers by the papers' authors.

关键词： Sentiment analysis

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：