To provide an overview of a query object to fast target users' demands, it is important to make the retrieved results as diverse as possible. In this paper, we fuse latent Dirichlet allocation and three text-based...
详细信息
To provide an overview of a query object to fast target users' demands, it is important to make the retrieved results as diverse as possible. In this paper, we fuse latent Dirichlet allocation and three text-based schemes to improve diversity of retrieved results. The experiments prove our advantages.
In this paper, we propose a method of preserving cumulative node constraints, which are derived from business rules such as policies during XML update. To handle the constraints, we treat the collection of nodes as ob...
详细信息
ISBN:
(纸本)9780769550220
In this paper, we propose a method of preserving cumulative node constraints, which are derived from business rules such as policies during XML update. To handle the constraints, we treat the collection of nodes as objects that have to be preserved. For any triggered update query, the query will be processed into a query object. And then, we use an object distance algorithm to carry out a validation process to compare the constraint object and the query object. To support the experimentation, we utilize Schematron as the language to convey the constraints. Lastly, we show evaluation of our algorithm on different sizes of data and analyse the performance of the algorithm in terms of the validation quality.
Nearest neighbor (NN) search in high dimensional space is an essential query in many multimedia retrieval applications. Due to the curse of dimensionality, existing index structures might perform even worse than a sim...
详细信息
ISBN:
(纸本)9781424489589
Nearest neighbor (NN) search in high dimensional space is an essential query in many multimedia retrieval applications. Due to the curse of dimensionality, existing index structures might perform even worse than a simple sequential scan of data when answering exact NN query. To improve the efficiency of NN search, locality sensitive hashing (LSH) and its variants have been proposed to find approximate NN. They adopt hash functions that can preserve the Euclidean distance so that similar objects have a high probability of colliding in the same bucket. Given a query object, candidate for the query result is obtained by accessing the points that are located in the same bucket. To improve the precision, each hash table is associated with m hash functions to recursively hash the data points into smaller buckets and remove the false positives. On the other hand, multiple hash tables are required to guarantee a high retrieval recall. Thus, tuning a good tradeoff between precision and recall becomes the main challenge for LSH. Recently, locality sensitive B-tree(LSB-tree) has been proposed to ensure both quality and efficiency. However, the index uses random I/O access. When the multimedia database is large, it requires considerable disk I/O cost to obtain an approximate ratio that works in practice. In this paper, we propose a novel index structure, named HashFile, for efficient retrieval of multimedia objects. It combines the advantages of random projection and linear scan. Unlike the LSH family in which each bucket is associated with a concatenation of m hash values, we only recursively partition the dense buckets and organize them as a tree structure. Given a query point q, the search algorithm explores the buckets near the query object in a top-down manner. The candidate buckets in each node are stored sequentially in increasing order of the hash value and can be efficiently loaded into memory for linear scan. HashFile can support both exact and approximate NN queries.
Background: The ability to generate transcriptional data on the scale of entire genomes has been a boon both in the improvement of biological understanding and in the amount of data generated. The latter, the amount o...
详细信息
Background: The ability to generate transcriptional data on the scale of entire genomes has been a boon both in the improvement of biological understanding and in the amount of data generated. The latter, the amount of data generated, has implications when it comes to effective storage, analysis and sharing of these data. A number of software tools have been developed to store, analyze, and share microarray data. However, a majority of these tools do not offer all of these features nor do they specifically target the commonly used two color Agilent DNA microarray platform. Thus, the motivating factor for the development of EDGE(3) was to incorporate the storage, analysis and sharing of microarray data in a manner that would provide a means for research groups to collaborate on Agilent-based microarray experiments without a large investment in software-related expenditures or extensive training of end-users. Results: EDGE(3) has been developed with two major functions in mind. The first function is to provide a workflow process for the generation of microarray data by a research laboratory or a microarray facility. The second is to store, analyze, and share microarray data in a manner that doesn't require complicated software. To satisfy the first function, EDGE(3) has been developed as a means to establish a well defined experimental workflow and information system for microarray generation. To satisfy the second function, the software application utilized as the user interface of EDGE(3) is a web browser. Within the web browser, a user is able to access the entire functionality, including, but not limited to, the ability to perform a number of bioinformatics based analyses, collaborate between research groups through a user-based security model, and access to the raw data files and quality control files generated by the software used to extract the signals from an array image. Conclusion: Here, we present EDGE(3), an open-source, web-based application that allows f
暂无评论