The proceedings contain 53 papers. The topics discussed include: location-based instant search;continuous inverse ranking queries in uncertain streams;finding haystacks with needles: ranked search for data using geosp...
ISBN:
(纸本)9783642223501
The proceedings contain 53 papers. The topics discussed include: location-based instant search;continuous inverse ranking queries in uncertain streams;finding haystacks with needles: ranked search for data using geospatial and temporal characteristics;using medians to generate consensus rankings for biological data;a truly dynamic data structure for top-k queries on uncertain data;efficient storage and temporal query evaluation in hierarchical data archiving systems;update propagation in a streaming warehouse;probabilistic time consistent queries over moving objects;knowledge annotations in scientific workflows: an implementation in Kepler;improving workflow fault tolerance through provenance-based recovery;provenance-enabled automatic data publishing;a panel discussion on data intensive science: moving towards solutions;querying shortest path distance with bounded errors in large graphs;and a flexible graph pattern matching framework via indexing.
The proceedings contain 46 papers. The special focus in this conference is on scientific and statisticaldatabasemanagement. The topics include: New challenges in petascale scientificdatabases;adventures in the blog...
The proceedings contain 46 papers. The special focus in this conference is on scientific and statisticaldatabasemanagement. The topics include: New challenges in petascale scientificdatabases;adventures in the blogosphere;query planning for searching inter-dependent deep-web databases;summarizing two-dimensional data with skyline-based statistical descriptors;query selectivity estimation for uncertain data;disclosure risks of distance preserving data transformations;privacy-preserving publication of user locations in the proximity of sensitive sites;a probabilistic framework for building privacy-preserving synopses of multi-dimensional data;efficient similarity search for tree-structured data;hierarchical graph embedding for efficient query processing in very large traffic networks;finding frequent items over general update streams;efficiently discovering recent frequent items in data streams;prioritized evaluation of continuous moving queries over streaming locations;adaptive request scheduling for parallel scientific web services;breaking the curse of cardinality on bitmap indexes;a new approach for optimization of dynamic metric access methods using an algorithm of effective deletion;an ontology-based index to retrieve documents with geographic information and mining temporal association patterns under a similarity constraint.
In most cases unique identifiers are required to join data from different databases. If global unique keys are absent or corrupted the supplement of data extracted from different sources becomes difficult. The main qu...
详细信息
In most cases unique identifiers are required to join data from different databases. If global unique keys are absent or corrupted the supplement of data extracted from different sources becomes difficult. The main question is: does a given record relates to an entity, which is identical to an entity corresponding to another record, or not? This leads to a classification problem with at least two classes: identical and not identical. Classifying pairs of records needs a three-step procedure. The first step is to define suitable common properties (attributes) of data for all different sources. Secondly, to allow comparisons the values of the records are transformed to these common properties. Finally, the classification is performed on an almost finite subset, the range of an appropriate comparison function. Different classification techniques can be applied like Association Rules, Classification Trees, Neural networks or Record Linkage techniques. The unknown parameters of the classification rules are computed by sampling and supervised learning. Unbiased error rates can be estimated for instance by cross validation. Special attention must be paid to control the computing complexity of the identification process. The approach is illustrated for data from two library databases and from the planned German administrative record census, which will become a substitute of a regular census.
The proceedings contain 51 papers. The topics discussed include: navigating oceans of data;probabilistic range monitoring of streaming uncertain positions in GeoSocial networks;probabilistic frequent pattern growth fo...
ISBN:
(纸本)9783642312342
The proceedings contain 51 papers. The topics discussed include: navigating oceans of data;probabilistic range monitoring of streaming uncertain positions in GeoSocial networks;probabilistic frequent pattern growth for itemset mining in uncertain databases;evaluating trajectory queries over imprecise location data;efficient range queries over uncertain strings;continuous probabilistic sum queries in wireless sensor networks with ranges;partitioning and multi-core parallelization of multi-equation forecast models;integrating GPU-accelerated sequence alignment and SNP detection for genome resequencing analysis;discovering representative skyline points over distributed data;SkyQuery: an implementation of a parallel probabilistic join engine for cross-identification of multiple astronomical databases;regular path queries on large graphs;and sampling connected induced subgraphs uniformly at random.
Object-oriented databases (OODBs) are in many ways a better match for scientific data management than conventional record-oriented database systems. The main benefit is in data model expressivity. The experiences of t...
详细信息
Object-oriented databases (OODBs) are in many ways a better match for scientific data management than conventional record-oriented database systems. The main benefit is in data model expressivity. The experiences of the author using OODBs for scientific data in the domains of computational chemistry and materials science are recounted. The main section of the paper deals with areas that need improvement for OODBs to support scientific applications well.
In order to achieve a good design of a probabilistic database, we introduce statistical join dependencies whose definition shows an apparent analogy with relational join dependencies. Indeed, a certain number of forma...
详细信息
In order to achieve a good design of a probabilistic database, we introduce statistical join dependencies whose definition shows an apparent analogy with relational join dependencies. Indeed, a certain number of formal properties of relational join dependencies are shared by statistical join dependencies;so, we can sometimes apply the design techniques employed for relational databases to probabilistic databases.
This paper describes a Web-based query system for semantically heterogeneous geospatial data Although Web-based information systems are currently being developed by the GIS community to provide data discovery and down...
详细信息
ISBN:
(纸本)0769519644
This paper describes a Web-based query system for semantically heterogeneous geospatial data Although Web-based information systems are currently being developed by the GIS community to provide data discovery and download capabilities for distributed Web data sets, they do not include the ability to pose DBMS type queries over the data. We developed a system that provides DBMS querying and that also resolves semantic differences that occur in distributed sources. We are working in the context of a proposed statewide land information system.
暂无评论