Geographic objects with descriptive text are gaining in prevalence in many web services such as Google *** keyword query which combines both the location information and textual description stands out in recent *** wo...
详细信息
Geographic objects with descriptive text are gaining in prevalence in many web services such as Google *** keyword query which combines both the location information and textual description stands out in recent *** works mainly focus on finding top-k Nearest Neighbours where each node has to match the whole querying keywords.A collective query has been proposed to retrieve a group of objects nearest to the query object such that the group's keywords cover query's keywords and has the shortest inner-object *** the previous method does not consider the density of data objects in the spatial *** practice,a group of dense data objects around a query point will be more interesting than those sparse data *** distance of data objects of a group cannot reflect the density of the *** overcome this shortage,we proposed an approximate algorithm to process the collective spatial keyword query based on density and inner *** empirical study shows that our algorithm can effectively retrieve the data objects in dense areas.
In this paper we first describe the technology of automatic annotation transformation, which is based on the annotation adaptation algorithm (Jiang et al., 2009). It can automatically transform a human-annotated corpu...
详细信息
ISBN:
(纸本)9781622765034
In this paper we first describe the technology of automatic annotation transformation, which is based on the annotation adaptation algorithm (Jiang et al., 2009). It can automatically transform a human-annotated corpus from one annotation guideline to another. We then propose two optimization strategies, iterative training and predict-self reestimation, to further improve the accuracy of annotation guideline transformation. Experiments on Chinese word segmentation show that, the iterative training strategy together with predict-self reestimation brings significant improvement over the simple annotation transformation baseline, and leads to classifiers with significantly higher accuracy and several times faster processing than annotation adaptation does. On the Penn Chinese Treebank 5.0, it achieves an F-measure of 98.43%, significantly outperforms previous works although using a single classifier with only local features.
This paper proposes a method which is not for summarization but for extracting multiple facets from a text according to the keyword sets representing readers’ interests,so that readers can obtain the interested facet...
详细信息
This paper proposes a method which is not for summarization but for extracting multiple facets from a text according to the keyword sets representing readers’ interests,so that readers can obtain the interested facets and carry out faceted navigation on text.A facet is a meaningful combination of the subsets of the *** text process technologies are mostly based on text features such as word frequency,sentence location,syntax analysis and discourse *** approaches neglect the cognition process of human *** proposed method considers human reading *** show that the facet extraction is effective and robust.
Decoding algorithms for syntax based machine translation suffer from high computational complexity, a consequence of intersecting a language model with a context free grammar. Left-to-right decoding, which generates t...
详细信息
Estimating taxonomic content constitutes a key problem in metagenomic sequencing data ***,extracting such content from high-throughput data of next-generation sequencing is very time-consuming with the currently avail...
详细信息
Estimating taxonomic content constitutes a key problem in metagenomic sequencing data ***,extracting such content from high-throughput data of next-generation sequencing is very time-consuming with the currently availab.e ***,we present CloudLCA,a parallel LCA algorithm that significantly improves the efficiency of determining taxonomic composition in metagenomic data *** show that CloudLCA(1)has a running time nearly linear with the increase of dataset magnitude,(2)displays linear speedup as the number of processors grows,especially for large datasets,and(3)reaches a speed of nearly 215 million reads each minute on a cluster with ten thin *** comparison with MEGAN,a well-known metagenome analyzer,the speed of CloudLCA is up to 5 more times faster,and its peak memory usage is approximately 18.5%that of MEGAN,running on a fat *** can be run on one multiprocessor node or a *** is expected to be part of MEGAN to accelerate analyzing reads,with the same output generated as MEGAN,which can be import into MEGAN in a direct way to finish the following ***,CloudLCA is a universal solution for finding the lowest common ancestor,and it can be applied in other fields requiring an LCA algorithm.
Utility services provided by cloud computing rely on virtual customer communities forming spontaneously and evolving continuously. Clarifying the explicit boundaries of these communities is thus essential to the quali...
详细信息
Most of the previous works for web video topic detection(e.g., graph-based co-clustering method) always encounter the problem of real-time topic detection, since they all suffer from the high computation complexity. T...
详细信息
Most of the previous works for web video topic detection(e.g., graph-based co-clustering method) always encounter the problem of real-time topic detection, since they all suffer from the high computation complexity. Therefore, a fast topic detection is needed to meet users' or administrators' requirement in real-world scenarios. Along this line, we propose a fast and effective topic detection framework, in which video streams are first partitioned into buckets using a time-window function, and then an incremental hierarchical clustering algorithm is developed, finally a video-based fusion strategy is used to integrate information from multiple modalities. Furthermore, a series of novel similarity metrics are defined in the framework. The experimental results on three months' YouTube videos demonstrate the effectiveness and efficiency of the proposed method.
Local learning approaches are especially easy for parallel processing, so they are very important for cloud computing. In 1997, Lotti A. Zadeh proposed the concept of Granular computing (GrC). Zadeh proposed that ther...
详细信息
Local learning approaches are especially easy for parallel processing, so they are very important for cloud computing. In 1997, Lotti A. Zadeh proposed the concept of Granular computing (GrC). Zadeh proposed that there are three basic concepts that underlie human cognition: granulation, organization and causation and a granule being a clump of points (objects) drawn together by indistinguishability, similarity, proximity or functionality. In this paper, we give out a novel local learning approach based on the concept of Granular computing named as "nested local learning NGLL". The experiment shows that the novel NGLL approach is better than the probabilistic latent semantic analysis (PLSA).
To explore the association relations among disease, pathogenesis, physician, symptoms and drug, we adapt a variational Apriori algorithm for discovering association rules on a dataset of the Qing Court Medical Records...
详细信息
In this paper, we present our system description for the CoNLL-2012 coreference resolution task on English, Chinese and Arabic. We investigate a projection-based model in which we first translate Chinese and Arabic in...
详细信息
ISBN:
(纸本)9781627484046
In this paper, we present our system description for the CoNLL-2012 coreference resolution task on English, Chinese and Arabic. We investigate a projection-based model in which we first translate Chinese and Arabic into English, run a publicly availab.e coreference system, and then use a new projection algorithm to map the coreferring entities back from English into mention candidates detected in the Chinese and Arabic source. We compare to a baseline that just runs the English coreference system on the supplied parses for Chinese and Arabic. Because our method does not beat the baseline system on the development set, we submit outputs generated by the baseline system as our final submission.
暂无评论