检索结果-内蒙古大学图书馆

17th Pacific-Asia Conference on knowledge Discovery and data Mining, PAKDD 2013

作者： Hassani, Marwan Kim, Yunsu Choi, Seungjin Seidl, Thomas Data Management and Data Exploration Group RWTH Aachen University Germany Department of Computer Science and Engineering Pohang University of Science and Technology Korea Republic of

ISBN: (纸本)9783642403187

Nowadays, most streaming data sources are becoming high-dimensional. Accordingly, subspace stream clustering, which aims at finding evolving clusters within subgroups of dimensions, has gained a significant importance. However, existing subspace clustering evaluation measures are mainly designed for static data, and cannot reflect the quality of the evolving nature of data streams. On the other hand, available stream clustering evaluation measures care only about the errors of the full-space clustering but not the quality of subspace clustering. In this paper we propose, to the first of our knowledge, the first subspace clustering measure that is designed for streaming data, called Sub-CMM : Subspace Cluster Mapping Measure. SubCMM is an effective evaluation measure for stream subspace clustering that is able to handle errors caused by emerging, moving, or splitting subspace clusters. Additionally, we propose a novel method for using available offline subspace clustering measures for data streams within the Subspace MOA framework. © Springer-Verlag 2013.

关键词： Quality control

来源：评论

学校读者我要写书评

暂无评论

Endurable SSD-Based Read Cache for Improving the Performance of Selective Restore from Deduplication Systems

引用

Journal of computer science & Technology 2018年第1期33卷 58-78页

作者： Jian Liu Yun-Peng Chai Xiao Qin Yao-Hong Liu Division of Computer Science and Engineering Louisiana State University Baton Rouge LA 70803 U.S.A. Key Laboratory of Data Engineering and Knowledge Engineering Ministry of Education of China Beijing 100872 China School of Information Renmin University of China Beijing 100872 China Shelby Center for Engineering Technology Department of Computer Science and Software Engineering Samuel Ginn College of Engineering Auburn University Auburn AL 368~9-53~7 U.S.A.

Deduplication has been commonly used in both enterprise storage systems and cloud storage. To overcome the performance challenge for the selective restore operations of deduplication systems, solid-state-drive-based （i.e., SSD-based） re^d cache cm, be deployed for speeding up by caching popular restore contents dynamically. Unfortunately, frequent data updates induced by classical cache schemes （e.g., LRU and LFU） significantly shorten SSDs＇ lifetime while slowing down I/O processes in SSDs. To address this problem, we propose a new solution -- LOP-Cache to greatly improve tile write durability of SSDs as well as I/O performance by enlarging the proportion of long-term popular （LOP） data among data written into SSD-based cache. LOP-Cache keeps LOP data in the SSD cache for a long time period to decrease the number of cache replacements. Furthermore, it prevents unpopular or unnecessary data in deduplication containers from being written into the SSD cache. We implemented LOP-Cache in a prototype deduplication system to evaluate its pertbrmance. Our experimental results indicate that LOP-Cache shortens the latency of selective restore by an average of 37.3% at the cost of a small SSD-based cache with only 5.56% capacity of the deduplicated data. Importantly, LOP-Cache improves SSDs＇ lifetime by a factor of 9.77. The evidence shows that LOP-Cache offers a cost-efficient SSD-based read cache solution to boost performance of selective restore for deduplication systems.

关键词： data deduplication solid state drive （SSD） flash cache endurance

来源：评论

学校读者我要写书评

暂无评论

Arabic/English word translation disambiguation using parallel corpora and matching schemes

Arabic/English word translation disambiguation using paralle...

引用

12th Conference of the European Association for Machine Translation, EAMT 2008

作者： Ahmed, Farag Nürnberger, Andreas Data and Knowledge Engineering Group Faculty of Computer Science Otto-von-Guericke-University of Magdeburg Building 29 Universitätsplatz 2 39106 Magdeburg Germany

ISBN: (纸本)9783000257704

The limited coverage of available Arabic language lexicons causes a serious challenge in Arabic cross language information retrieval. Translation in cross language information retrieval consists of assigning one of the semantic representation terms in the target language to the intended query. Despite the problem of the completeness of the dictionary, we also face the problem of which one of the translations proposed by the dictionary for each query term should be included in the query translations. In this paper, we describe the implementation and evaluation of an Arabic/English word translation disambiguation approach that is based on exploiting a large bilingual corpus and statistical co-occurrence to find the correct sense for the query translations terms. The correct word translations of the given query term are determined based on their cohesion with words in the training corpus and a special similarity score measure. The specific properties of the Arabic language that frequently hinder the correct match are taken into account.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

CATIRI:An Efficient Method for Content-and-Text Based Image Retrieval

引用

Journal of computer science & Technology 2019年第2期34卷 287-304页

作者： Mengqi Zeng Bin Yao Zhi-Jie Wang Yanyan Shen Feifei Li Jianfeng Zhang Hao Lin Minyi Guo Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai 200240 China School of Data and Computer Science Sun Yat-sen University Guangzhou 510006 China Guangdong Key Laboratory of Big Data Analysis and Processing Guangzhou 510006 China National Engineering Laboratory for Big Data Analysis and Applications Beijing 100871 China School of Computing University of Utah Salt Lake City 84112 U.S.A. Alibaba Group Hangzhou 311121 China

The combination of visual and textual information in image retrieval remarkably alleviates the semantic gap of traditional image retrieval methods,and thus it has attracted much attention *** retrieval based on such a combination is usually called the content-and-text based image retrieval(CTBIR).Nevertheless,existing studies in CTBIR mainly make efforts on improving the retrieval *** the best of our knowledge,little attention has been focused on how to enhance the retrieval ***,image data is widespread and expanding rapidly in our daily ***,it is important and interesting to investigate the retrieval *** this end,this paper presents an efficient image retrieval method named CATIRI(content-and-text based image retrieval using indexing).CATIRI follows a three-phase solution framework that develops a new indexing structure called *** MHIM-tree seamlessly integrates several elements including Manhattan Hashing,Inverted index,and *** use our MHIM-tree wisely in the query,we present a set of important metrics and reveal their inherent *** on them,we develop a top-k query algorithm for *** results based on benchmark image datasets demonstrate that CATIRI outperforms the competitors by an order of magnitude.

关键词： image retrieval text-and-visual feature indexing top-k

来源：评论

学校读者我要写书评

暂无评论

Augmenting bag-of-words - Category specific features and concept reasoning

Augmenting bag-of-words - Category specific features and con...

引用

2010 Cross Language Evaluation Forum Conference, CLEF 2010

作者： Mbanya, Eugene Hentschel, Christian Gerke, Sebastian Liu, Mohan Nurnberger, Andreas Ndjiki-Nya, Patrick Fraunhofer Institute for Telecommunications Heinrich Hertz Institute Germany Data and Knowledge Engineering Group Faculty of Computer Science Otto-von-Guericke-University Magdeburg Germany

In this paper we present our approach to the 2010 ImageClef PhotoAnnotation task. Based on the well-known bag-of-words approach we suggest two extensions. First, we analyzed the impact of category specific features and classifiers. In order to classify quality-related image categories we implemented a sharpness measure and use this as additional feature in the classification process. Second, we propose a post- classification step, which is based on the observation that many of the categories should be considered as being related to each other: Some categories exclude or allow for inference to others. We incorporate inference and exclusion rules by refining the classification results. The results we obtain show that both extensions can provide a classification performance increase when compared the the standard BoW approach.

关键词： Classification (of information)

来源：评论

学校读者我要写书评

暂无评论

Modeling location-based profiles of social image media using explorative pattern mining

Modeling location-based profiles of social image media using...

引用

2011 IEEE International Conference on Privacy, Security, Risk and Trust, PASSAT 2011 and 2011 IEEE International Conference on Social Computing, SocialCom 2011

作者： Lemmerich, Florian Atzmueller, Martin Artificial Intelligence and Applied Computer Science University of Würzburg 97074 Ẅurzburg Germany Knowledge and Data Engineering Group University of Kassel 34121 Kassel Germany

ISBN: (纸本)9780769545783

This paper presents an approach for modeling location-based profiles of social image media based on tagging information and collaborative geo-reference annotations. We utilize pattern mining techniques for obtaining sets of tags that are specific for the specified point, landmark, or region of interest. Next, we show how these candidate patterns can be presented and visualized for interactive exploration using a combination of general pattern mining visualizations and views specialized on geo-referenced tagging data. We present a case study using publicly available data from the Flickr photo sharing application. © 2011 IEEE.

关键词： data mining

来源：评论

学校读者我要写书评

暂无评论

Assessing the quality of R2RML mappings 1

Assessing the quality of R2RML mappings

引用

Joint 1st International Workshop on Semantics for Transport and the 1st International Workshop on Approaches for Making data Interoperable, SEM4TRA-AMAR 2019

作者： Crotti, Ademar Debattista, Jeremy O’Sullivan, Declan ADAPT Centre for Digital Content Platform Research Knowledge and Data Engineering Group School of Computer Science and Statistics Trinity College Dublin Dublin 2 Ireland

This paper presents an approach to assess the quality of mappings used to generate RDF datasets. data quality is a multidimensional concept determined by many factors which influence the extent by which a dataset is useful for a particular task. Several solutions have been proposed in literature to assess the quality of RDF datasets. Nonetheless, in most cases, these solutions focus on the resulting datasets and not on the artefacts used to generate these. In this paper, we propose the use of metrics commonly used to assess the quality of such datasets to evaluate the mappings used to generate them. The goal is to assist data providers into producing high quality datasets by bringing such quality assessment procedures to also cover the start of the publishing process. We provide an implementation of the approach by extending an existing quality assessment framework, which is then evaluated using real world use cases. Preliminary results shows that the assessment of mappings is capable to identifying quality issues for the observed cases. Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

关键词： Mapping

来源：评论

学校读者我要写书评

暂无评论

Senzi: A sentiment analysis lexicon for the latinised Arabic (Arabizi) 12

Senzi: A sentiment analysis lexicon for the latinised Arabic...

引用

12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019

作者： Tobaili, Taha Fernandez, Miriam Alani, Harith Sharafeddine, Sanaa Hajj, Hazem Glavaš, Goran Knowledge Media Institute Open University United Kingdom Department of Computer Science and Mathematics Lebanese American University Lebanon Department of Electrical and Computer Engineering American University of Beirut Lebanon Data and Web Science Group Universität Mannheim Germany

ISBN: (纸本)9789544520557

Arabizi is an informal written form of dialectal Arabic transcribed in Latin alphanumeric characters. It has a proven popularity on chat platforms and social media, yet it suffers from a severe lack of natural language processing (NLP) resources. As such, texts written in Arabizi are often disregarded in sentiment analysis tasks for Arabic. In this paper we describe the creation of a sentiment lexicon for Arabizi that was enriched with word embeddings. The result is a new Arabizi lexicon consisting of 11.3K positive and 13.3K negative words. We evaluated this lexicon by classifying the sentiment of Arabizi tweets achieving an F1-score of 0.72. We provide a detailed error analysis to present the challenges that impact the sentiment analysis of Arabizi. © 2019 Association for Computational Linguistics (ACL). All rights reserved.

关键词： Sentiment analysis

来源：评论

学校读者我要写书评

暂无评论

Accelerating local SGD for non-IID data using variance reduction

引用

Frontiers of computer science 2023年第2期17卷 73-89页

作者： Xianfeng LIANG Shuheng SHEN Enhong CHEN Jinchang LIU Qi LIU Yifei CHENG Zhen PAN Anhui Province Key Laboratory of Big Data Analysis and Application University of Science and Technology of ChinaHefei 230027China Ant Financial Services Group Hangzhou 310000China Department of Computer Science and Engineering Hong Kong University of Science and TechnologyHong Kong 999077China

Distributed stochastic gradient descent and its variants have been widely adopted in the training of machine learning models,which apply multiple workers in *** them,local-based algorithms,including Local SGD and FedAvg,have gained much attention due to their superior properties,such as low communication cost and ***,when the data distribution on workers is non-identical,local-based algorithms would encounter a significant degradation in the convergence *** this paper,we propose Variance Reduced Local SGD(VRL-SGD)to deal with the heterogeneous *** extra communication cost,VRL-SGD can reduce the gradient variance among workers caused by the heterogeneous data,and thus it prevents local-based algorithms from slow convergence ***,we present VRL-SGD-W with an effectivewarm-up mechanism for the scenarios,where the data among workers are quite *** from eliminating the impact of such heterogeneous data,we theoretically prove that VRL-SGD achieves a linear iteration speedup with lower communication complexity even if workers access non-identical *** conduct experiments on three machine learning *** experimental results demonstrate that VRL-SGD performs impressively better than Local SGD for the heterogeneous data and VRL-SGD-W is much robust under high data variance among workers.

关键词： distributed optimization variance reduction local SGD federated learning non-IID data

来源：评论

学校读者我要写书评

暂无评论

Learning-based interactive retrieval in large-scale multimedia collections

Learning-based interactive retrieval in large-scale multimed...

引用

9th International Workshop on Adaptive Multimedia Retrieval, AMR 2011

作者： Mohamed, Hisham Von Wyl, Marc Bruno, Eric Marchand-Maillet, Stéphane Viper Group Department of Computer Science University of Geneva Switzerland Data Mining and Knowledge Discovery - Corporate R and D Division Firmenich SA Switzerland

ISBN: (纸本)9783642374241

Indexing web-scale multimedia is only possible by distributing storage and computing efforts. Existing large-scale content-based indexing services mostly do not offer interactive relevance feedback. Here, we detail the construction of our Cross-Modal Search Engine (CMSE) implementing a query-by-example search strategy with relevance feedback and distributed over a cluster of 20 Dual core machines using MPI. We present the performance gain in terms of interactivity (search time) using a part of the Image-Net collection containing more than one million images as base example. © 2013 Springer-Verlag Berlin Heidelberg.

关键词： Search engines

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：