Efficient similarity search in large string databases requires effective index support. Since long strings have each numerous substrings of arbitrary length, the effective index designs are of great challenge. the exi...
详细信息
ISBN:
(纸本)0769522491
Efficient similarity search in large string databases requires effective index support. Since long strings have each numerous substrings of arbitrary length, the effective index designs are of great challenge. the existing solution, namely MRS [10], employs a low-cost lower bound function to sieve out the most similar candidates from the majority of unlikely database substrings. therefore, only very small portions of string databases require the expensive true edit distance computation to finalize the query. A significant savings in overall query processing cost can be realized by the filtration feature of lower bound functions. In this paper, we seek to improve MRS to its full potential. Specifically, we propose a very simple method that exchanges the roles of database strings and query string in the original MRS design. Despite simplicity, our solution can further improve the query performance by 10 times in terms of disk page accesses while using only half of the original index's size.
Flash-based solid state device(SSD) is making deep inroads into enterprise databaseapplications due to its faster data access. the capacity and performance characteristics of SSD make it well-suited for use as a seco...
详细信息
ISBN:
(纸本)9783319181202;9783319181196
Flash-based solid state device(SSD) is making deep inroads into enterprise databaseapplications due to its faster data access. the capacity and performance characteristics of SSD make it well-suited for use as a second-level buffer cache. In this paper, we propose a SSD-based multilevel buffer scheme, called flash-aware second-level cache(FASC), where SSD serves as an extension of the DRAM buffer. Our goal is to reduce the number of disk accesses by caching the pages evicted from DRAM in the SSD, thereby enhancing the performance of databasesystems. For this purpose, a cost-aware main memory replacement policy is proposed, which can efficiently reduce the cost of page evictions. To take full advantage of the SSD, a block-based data management policy is designed to save the memory overheads, as well as reducing the write amplification of flash memory. To identify the hot pages for providing great performance benefits, a memory-efficient replacement policy is proposed for the SSD. Moreover, we also present a light-weight recovery policy, which is used to recover the data cached in the SSD in case of system crash. We implement a prototype based on PostgreSQL and evaluate the performance of FASC. the experimental results show that FASC achieves significant performance improvements.
this paper introduces a semantic metadata mapping procedure which is able to maximize the interoperability among metadata. the methodology consists of three processes such as identifying metadata element sets, groupin...
详细信息
Region of Interest (ROI) queries are of great importance in many location based services. However, the previous studies on ROI queries usually adopt either a simple spatial data model or a non-flexible enough query ge...
详细信息
ISBN:
(纸本)9783030731939;9783030731946
Region of Interest (ROI) queries are of great importance in many location based services. However, the previous studies on ROI queries usually adopt either a simple spatial data model or a non-flexible enough query geometry, e.g., fixed-size rectangle. In this paper, to fix these drawbacks, we propose a new ROI search operator called Radius Bounded ROI (RBR) query. An RBR query retrieves a subset of spatial objects satisfying co-location constraints and maximizing a user-configurable score function at the same time. We formally prove that answering an RBR query is 3SUM-hard, which implies that it is unlikely to find a sub-quadratic solution. To answer the RBR queries efficiently, we propose three algorithms, PairEnum, BaseRotation and OptRotation based on novel geometric findings. In addition, the query processing technique we proposed can be easily extended to other related problems like top-k ROI search. To demonstrate both efficiency and effectiveness of our proposed algorithms, we conduct extensive experimental studies on both real-world datasets and synthetic benchmarks, and the results show that OptRotation, our most efficient algorithm, achieves more than 10(3) x efficiency improvement on both real and synthetic datasets compared withthe baseline algorithm.
this two volume set LNCS 7825 and LNCS 7826 constitutes the refereed proceedings of the 18thinternationalconference on databasesystems for advancedapplications, dasfaa 2013, held in Wuhan, China, in April 2013. th...
详细信息
ISBN:
(数字)9783642374876
ISBN:
(纸本)9783642374869
this two volume set LNCS 7825 and LNCS 7826 constitutes the refereed proceedings of the 18thinternationalconference on databasesystems for advancedapplications, dasfaa 2013, held in Wuhan, China, in April 2013. the 51 revised full papers and 10 short papers presented together with 2 invited keynote talks, 1 invited paper, 3 industrial papers, 9 demo presentations, 4 tutorials and 1 panel paper were carefully reviewed and selected from a total of 227 submissions. the topics covered in part 1 are social networks; query processing; nearest neighbor search; index; query analysis; XML data management; privacy protection; and uncertain data management; and in part 2: graph data management; physical design; knowledge management; temporal data management; social networks; query processing; data mining; applications; and databaseapplications.
Free tree, as a special graph which is connected, undirected and acyclic, has been extensively used in bioinformatics, pattern recognition, computer networks, XML databases, etc. Recent research on structural pattern ...
详细信息
Data completeness is an essential aspect of data quality as in many scenarios it is crucial to guarantee the completeness of query answers. Data might be incomplete in two ways: records may be missing as a whole, or a...
详细信息
the paper presents MDDQL as a query language suitable for multilingual conceptual querying of collections of databases from a graphical user interface or from an application programming one. the query language, howeve...
详细信息
ISBN:
(纸本)3540260315
the paper presents MDDQL as a query language suitable for multilingual conceptual querying of collections of databases from a graphical user interface or from an application programming one. the query language, however, has been specified and implemented withthe parametric theory of linguistic diversity in mind such that syntactic and semantic parsing of multi-lingual query expressions becomes quite simple and guarantees identical query results regardless the preferred natural language. We present a parsing algorithm, which shows that it is quite easy to formulate a query regardless the underlying type order of a natural language, be it Subject-Object-Verb, Subject-Verb-Object, or Object-Verb-Subject, etc., and still being able to grasp the meaning of the query at a minimal computational effort possible.
the overall computing power available today has mado possible tor small enterprises and laboratories to develop applicationsthat need big amounts of storage. this storage has traditionally been expensive, using propi...
详细信息
A large amount of biological knowledge today is only available from full-text research papers. Since neither manual database curators nor users can keep up withthe rapidly expanding volume of scientific literature, n...
详细信息
暂无评论