Nowadays, most streaming data sources are becoming high-dimensional. Accordingly, subspace stream clustering, which aims at finding evolving clusters within subgroups of dimensions, has gained a significant importance...
详细信息
Deduplication has been commonly used in both enterprise storage systems and cloud storage. To overcome the performance challenge for the selective restore operations of deduplication systems, solid-state-drive-based ...
详细信息
Deduplication has been commonly used in both enterprise storage systems and cloud storage. To overcome the performance challenge for the selective restore operations of deduplication systems, solid-state-drive-based (i.e., SSD-based) re^d cache cm, be deployed for speeding up by caching popular restore contents dynamically. Unfortunately, frequent data updates induced by classical cache schemes (e.g., LRU and LFU) significantly shorten SSDs' lifetime while slowing down I/O processes in SSDs. To address this problem, we propose a new solution -- LOP-Cache to greatly improve tile write durability of SSDs as well as I/O performance by enlarging the proportion of long-term popular (LOP) data among data written into SSD-based cache. LOP-Cache keeps LOP data in the SSD cache for a long time period to decrease the number of cache replacements. Furthermore, it prevents unpopular or unnecessary data in deduplication containers from being written into the SSD cache. We implemented LOP-Cache in a prototype deduplication system to evaluate its pertbrmance. Our experimental results indicate that LOP-Cache shortens the latency of selective restore by an average of 37.3% at the cost of a small SSD-based cache with only 5.56% capacity of the deduplicated data. Importantly, LOP-Cache improves SSDs' lifetime by a factor of 9.77. The evidence shows that LOP-Cache offers a cost-efficient SSD-based read cache solution to boost performance of selective restore for deduplication systems.
The limited coverage of available Arabic language lexicons causes a serious challenge in Arabic cross language information retrieval. Translation in cross language information retrieval consists of assigning one of th...
详细信息
ISBN:
(纸本)9783000257704
The limited coverage of available Arabic language lexicons causes a serious challenge in Arabic cross language information retrieval. Translation in cross language information retrieval consists of assigning one of the semantic representation terms in the target language to the intended query. Despite the problem of the completeness of the dictionary, we also face the problem of which one of the translations proposed by the dictionary for each query term should be included in the query translations. In this paper, we describe the implementation and evaluation of an Arabic/English word translation disambiguation approach that is based on exploiting a large bilingual corpus and statistical co-occurrence to find the correct sense for the query translations terms. The correct word translations of the given query term are determined based on their cohesion with words in the training corpus and a special similarity score measure. The specific properties of the Arabic language that frequently hinder the correct match are taken into account.
The combination of visual and textual information in image retrieval remarkably alleviates the semantic gap of traditional image retrieval methods,and thus it has attracted much attention *** retrieval based on such a...
详细信息
The combination of visual and textual information in image retrieval remarkably alleviates the semantic gap of traditional image retrieval methods,and thus it has attracted much attention *** retrieval based on such a combination is usually called the content-and-text based image retrieval(CTBIR).Nevertheless,existing studies in CTBIR mainly make efforts on improving the retrieval *** the best of our knowledge,little attention has been focused on how to enhance the retrieval ***,image data is widespread and expanding rapidly in our daily ***,it is important and interesting to investigate the retrieval *** this end,this paper presents an efficient image retrieval method named CATIRI(content-and-text based image retrieval using indexing).CATIRI follows a three-phase solution framework that develops a new indexing structure called *** MHIM-tree seamlessly integrates several elements including Manhattan Hashing,Inverted index,and *** use our MHIM-tree wisely in the query,we present a set of important metrics and reveal their inherent *** on them,we develop a top-k query algorithm for *** results based on benchmark image datasets demonstrate that CATIRI outperforms the competitors by an order of magnitude.
In this paper we present our approach to the 2010 ImageClef PhotoAnnotation task. Based on the well-known bag-of-words approach we suggest two extensions. First, we analyzed the impact of category specific features an...
详细信息
In this paper we present our approach to the 2010 ImageClef PhotoAnnotation task. Based on the well-known bag-of-words approach we suggest two extensions. First, we analyzed the impact of category specific features and classifiers. In order to classify quality-related image categories we implemented a sharpness measure and use this as additional feature in the classification process. Second, we propose a post- classification step, which is based on the observation that many of the categories should be considered as being related to each other: Some categories exclude or allow for inference to others. We incorporate inference and exclusion rules by refining the classification results. The results we obtain show that both extensions can provide a classification performance increase when compared the the standard BoW approach.
This paper presents an approach for modeling location-based profiles of social image media based on tagging information and collaborative geo-reference annotations. We utilize pattern mining techniques for obtaining s...
详细信息
This paper presents an approach to assess the quality of mappings used to generate RDF datasets. data quality is a multidimensional concept determined by many factors which influence the extent by which a dataset is u...
详细信息
Arabizi is an informal written form of dialectal Arabic transcribed in Latin alphanumeric characters. It has a proven popularity on chat platforms and social media, yet it suffers from a severe lack of natural languag...
详细信息
Distributed stochastic gradient descent and its variants have been widely adopted in the training of machine learning models,which apply multiple workers in *** them,local-based algorithms,including Local SGD and FedA...
详细信息
Distributed stochastic gradient descent and its variants have been widely adopted in the training of machine learning models,which apply multiple workers in *** them,local-based algorithms,including Local SGD and FedAvg,have gained much attention due to their superior properties,such as low communication cost and ***,when the data distribution on workers is non-identical,local-based algorithms would encounter a significant degradation in the convergence *** this paper,we propose Variance Reduced Local SGD(VRL-SGD)to deal with the heterogeneous *** extra communication cost,VRL-SGD can reduce the gradient variance among workers caused by the heterogeneous data,and thus it prevents local-based algorithms from slow convergence ***,we present VRL-SGD-W with an effectivewarm-up mechanism for the scenarios,where the data among workers are quite *** from eliminating the impact of such heterogeneous data,we theoretically prove that VRL-SGD achieves a linear iteration speedup with lower communication complexity even if workers access non-identical *** conduct experiments on three machine learning *** experimental results demonstrate that VRL-SGD performs impressively better than Local SGD for the heterogeneous data and VRL-SGD-W is much robust under high data variance among workers.
Indexing web-scale multimedia is only possible by distributing storage and computing efforts. Existing large-scale content-based indexing services mostly do not offer interactive relevance feedback. Here, we detail th...
详细信息
暂无评论