This paper discusses basic issues about the performance of semi-structured queryprocessing in very large datasets. It is based on recent algorithm-engineering work, on the state of the art in performance management f...
详细信息
ISBN:
(纸本)9781467312332
This paper discusses basic issues about the performance of semi-structured queryprocessing in very large datasets. It is based on recent algorithm-engineering work, on the state of the art in performance management for XML queryprocessing and on theoretical studies about the complexity structure of the querying problem. Its main conclusions provide a concrete view on the interaction between terabyte scale XML data, query complexity and current or future computer architectures. To provide a concrete and synthetic view of this diverse body of knowledge, the presentation follows a fictional use case whose characters face query problems of varying complexity that are set in multiple contexts and analyzed today before being projected to 2020 and 2030.
A wireless sensor network (WSN) can be construed as an intelligent, largescale device for observing and measuring properties of the physical world. In recent years, the database research community has championed the v...
详细信息
A wireless sensor network (WSN) can be construed as an intelligent, largescale device for observing and measuring properties of the physical world. In recent years, the database research community has championed the view that if we construe a WSN as a database (i.e., if a significant aspect of its intelligent behavior is that it can execute declaratively-expressed queries), then one can achieve a significant reduction in the cost of engineering the software that implements a data collection program for the WSN while still achieving, through query optimization, very favorable cost:benefit ratios. This paper describes a queryprocessing framework for WSNs that meets many desiderata associated with the view of WSN as databases. The framework is presented in the form of compiler/optimizer, called SNEE, for a continuous declarative query language over sensed data streams, called SNEEql. SNEEql can be shown to meet the expressiveness requirements of a large class of applications. SNEE can be shown to generate effective and efficient query evaluation plans. More specifically, the paper describes the following contributions: (1) a user-level syntax and physical algebra for SNEEql, an expressive continuous query language over WSNs;(2) example concrete algorithms for physical algebraic operators defined in such a way that the task of deriving memory, time and energy analytical cost-estimation models (CEMs) for them becomes straightforward by reduction to a structural traversal of the pseudocode;(3) CEMs for the concrete algorithms alluded to;(4) an architecture for the optimization of SNEEql queries, called SNEE, building on well-established distributed query processing components where possible, but making enhancements or refinements where necessary to accommodate the WSN context;(5) algorithms that instantiate the components in the SNEE architecture, thereby supporting integrated query planning that includes routing, placement and timing;and (6) an empirical performance eva
Over recent years, massive geospatial information has been produced at a prodigious rate, and is usually geographically distributed across the Internet. Grid computing, as a recent development in the landscape of dist...
详细信息
Over recent years, massive geospatial information has been produced at a prodigious rate, and is usually geographically distributed across the Internet. Grid computing, as a recent development in the landscape of distributed computing, is deemed as a good solution for distributed geospatial data management and manipulation. Thus, the Grid computing technology can be applied to integrate various distributed resources into a 'super-computer' that enables efficient distributed geospatial queryprocessing. In order to realize this vision, an effective mechanism for building the distributed geospatial query workflow in the Grid environment needs to be elaborately designed. The workflow-building technology aims to automatically transform the global geospatial query into an equivalent distributedquery process in the Grid. In response to this goal, detailed steps and algorithms for building the distributed geospatial query workflow in the Grid environment are discussed in this article. Moreover, we develop corresponding software tools that enable Grid-based geospatial queries to be run against multiple data resources. Experimental results demonstrate that the proposed methodology is feasible and correct.
Nowadays, Spatial Data Infrastructures (SDIs) play an important role in government agencies, at different levels: global, national, and local. They aim to improve the management and sharing of geospatial data. Nonethe...
详细信息
Nowadays, Spatial Data Infrastructures (SDIs) play an important role in government agencies, at different levels: global, national, and local. They aim to improve the management and sharing of geospatial data. Nonetheless, these SDIs have been developed as information islands, in which a user's query is compared to metadata described only in their own catalog services. The lack of interaction among SDIs limits the potential of these infrastructures in providing geospatial data to a larger audience. This article presents a distributed architecture, based on a federation of SDIs which interact among themselves, using query propagation. This propagation facilitates data discovery and sharing. We also describe a distributed query processing service used to enable the resource discovery in distributed infrastructures.
This paper aims to provide a service-oriented data integration solution over data Grids for cases where distributed data sources are partitioned with overlapping sections of various proportions. This is an interesting...
详细信息
This paper aims to provide a service-oriented data integration solution over data Grids for cases where distributed data sources are partitioned with overlapping sections of various proportions. This is an interesting variation which combines both replicated and partitioned data within the same data management framework. Thus, the data management infrastructure has to deal with specific challenges regarding the identification, access and aggregation of partitioned data with varying proportions of overlapping sections. In order to provide a solution we have extended a well-known data access and integration middleware, namely Open Grid Services Architecture-Data Access and Integration distributed query processing (OGSA-DAI DQP), with distributed query processing facilities, by incorporating the new 'UnionPartitions' operator into its algebra in order to cope with various unusual forms of horizontally partitioned databases. Our solution extends OGSA-DAI DQP in two aspects: (1)a new operator type is added to the algebra to handle the union of the partitions with different characteristics, and (2)OGSA-DAI DQP Federation Description is extended to include some more metadata to facilitate the successful execution of the newly introduced operator. (C) 2010 Elsevier B.V. All rights reserved.
Large amount of uncertain data is collected by many emerging applications which contain multiple sources in a distributed manner. Previous efforts on querying uncertain data in distributed environment have only focus ...
详细信息
ISBN:
(纸本)9783642235344
Large amount of uncertain data is collected by many emerging applications which contain multiple sources in a distributed manner. Previous efforts on querying uncertain data in distributed environment have only focus on ranking and skyline, join queries have not been addressed in earlier work despite their importance in databases. In this paper, we address distributed probabilistic threshold join query, which retrieves results satisfying the join condition with combining probabilities that meet the threshold requirement from distributed sites. We propose a new kind of bloom filters called Probability Bloom Filters (PBF) to represent set with probabilistic attribute and design a PBF based Bloomjoin algorithm for executing distributed probabilistic threshold join query with communication efficiency. Furthermore, we provide theoretical analysis of the network cost of our algorithm and demonstrate it by simulation. The experiment results show that our algorithm can save network cost efficiently by comparing to original Bloomjoin algorithm in most scenarios.
The SNEE query optimizer enables users to characterize data requests against wireless sensor networks (WSNs), using a declarative query language called SNEEql (SNEE for Sensor NEtwork Engine, described in [GBG+11], an...
详细信息
ISBN:
(纸本)9783642245763;9783642245770
The SNEE query optimizer enables users to characterize data requests against wireless sensor networks (WSNs), using a declarative query language called SNEEql (SNEE for Sensor NEtwork Engine, described in [GBG+11], and publicly available at http://***/p/snee ). Queries are compiled into imperative query execution plans, which are translated into executable nesC source code. In this paper, we illustrate the lifecycle of a SNEEql query Q for in-network execution. This lifecycle encompasses the steps of preparatory metadata collection, followed by the compilation of Q into a query execution plan QEP, the dissemination of binary images implementing QEP throughout the WSN, and the generation of query results.
In this paper,we consider skyline queries in a mobile and distributed environment,where data objects are distributed in some sites(database servers)which are interconnected through a high-speed wired network,and queri...
详细信息
In this paper,we consider skyline queries in a mobile and distributed environment,where data objects are distributed in some sites(database servers)which are interconnected through a high-speed wired network,and queries are issued by mobile units(laptop,cell phone,etc.)which access the data objects of database servers by wireless *** inherent properties of mobile computing environment such as mobility,limited wireless bandwidth,frequent disconnection,make skyline queries more *** show how to efficiently perform distributed skyline queries in a mobile environment and propose a skyline queryprocessing approach,called efficient distributed skyline based on mobile computing(EDS-MC).In EDS-MC,a distributed skyline query is decomposed into five processing phases and each phase is elaborately designed in order to reduce the network communication,network delay and query response *** conduct extensive experiments in a simulated mobile database system,and the experimental results demonstrate the superiority of EDS-MC over other skyline queryprocessing techniques on mobile computing.
Recently, a number of query processors has been proposed for the evaluation of relational queries in structured P2P systems. However, as these approaches do not consider peer or link failures, they cannot be deployed ...
详细信息
Recently, a number of query processors has been proposed for the evaluation of relational queries in structured P2P systems. However, as these approaches do not consider peer or link failures, they cannot be deployed without extensions for real-world applications. We show that typical failures in structured P2P systems can have an unpredictable impact on the correctness of the result. In particular stateful operators that store intermediate results on peers, e.g., the distributed hash join, must protect such results against failures. Although many replication schemes for P2P systems exist, they cannot replicate operator states while the query is processed. In this paper we propose an in-query replication scheme which replicates the state of an operator among the neighbors of the processing peer. Our analytical evaluation shows that the network overhead of the in-query replication is in O(1) regarding network size, i.e., our scheme is scalable. We have carried out an extensive experimental evaluation using simulations as well as a PlanetLab deployment. It confirms the effectiveness and the efficiency of the in-query replication scheme and shows the effectiveness of the routing extension in networks of varying reliability.
Wireless sensor networks (WSN) are composed of several sensors having limited memory, processing power, communication bandwidth, and energy, which cooperate in performing a given task. The use of the database paradigm...
详细信息
Wireless sensor networks (WSN) are composed of several sensors having limited memory, processing power, communication bandwidth, and energy, which cooperate in performing a given task. The use of the database paradigm has emerged in the last few years as a viable solution to manage data in such a context. In this paper we present the MaD-WiSe system, a distributed query processing framework that moves the processing of the query into the network. MaD-WiSe reconsiders various aspects related to database system design and it reinterprets them according to the WSN constraints and requirements. In particular it considers the aspects related to the definition of a query language to formalize the queries, a stream model to manage data acquired by the sensors, a query algebra to define the operators that actually perform the query, and energy efficiency and query optimization strategies for saving energy. Copyright (c) 2010 John Wiley & Sons, Ltd.
暂无评论