Column statistics are an important element of cardinality estimation frameworks. More accurate estimates allow the optimizer of a RDBMS to generate better plans and improve the overall system's efficiency. This pa...
详细信息
ISBN:
(纸本)9781605585543
Column statistics are an important element of cardinality estimation frameworks. More accurate estimates allow the optimizer of a RDBMS to generate better plans and improve the overall system's efficiency. This paper introduces filtered statistics, which model value distribution over a set of rows restricted by a predicate. This feature, available in microsoft sql server, can be used to handle column correlation, as well as focus on interesting data ranges. In particular, it fits well for scenarios with logical subtables, like flexible schema or multi-tenant applications. Integration with the existing cardinality estimation infrastructure is presented.
Automatically mining entities, relationships, and semantics from unstructured documents and storing these in relational tables, greatly simplifies and unifies the work flows and user experiences of database products a...
详细信息
ISBN:
(纸本)9789898425799
Automatically mining entities, relationships, and semantics from unstructured documents and storing these in relational tables, greatly simplifies and unifies the work flows and user experiences of database products at the Enterprise. This paper describes three linear scale, incremental, and fully automatic semantic mining algorithms that are at the foundation of the new Semantic Platform being released in the next version of sqlserver. The target workload is large (10 - 100 million) enterprise document corpuses. At these scales, anything short of linear scale and incremental is costly to deploy. These three algorithms give rise to three weighted physical indexes: Tag Index (top keywords in each document);Document Similarity Index (top closely related documents given any document);and Phrase Similarity Index (top semantically related phrases, given any phrase), which are then query-able through the sql interface. The need for specifically creating these three indexes was motivated by observing typical stages of document research, and gap analysis, given current tools and technology at the Enterprise. We describe the mining algorithms and architecture, and outline some compelling user experiences that are enabled by these indexes.
Efficient and convenient handling of heterogeneous data is a current challenge for data management systems. In this paper, we discuss several common relational approaches to represent heterogeneity and argue for a des...
详细信息
ISBN:
(纸本)9781424418367
Efficient and convenient handling of heterogeneous data is a current challenge for data management systems. In this paper, we discuss several common relational approaches to represent heterogeneity and argue for a design based on a single wide-table, referred to as a flexible schema. For this scenario, we focus on partial indexation and its support for efficient data storage and processing. Filtered indices provide partial indexation functionality in the microsoft sql server product. We describe here the implementation of this feature, including index utilization in queries, index maintenance and query parameterization issues. Our performance experiments validate the expected benefits of the approach in our implementation.
We propose a model for errors in sung queries, a variant of the hidden Markov model (HMM). This is a solution to the problem of identifying the degree of similarity between a (typically error-laden) sung query and a p...
详细信息
We propose a model for errors in sung queries, a variant of the hidden Markov model (HMM). This is a solution to the problem of identifying the degree of similarity between a (typically error-laden) sung query and a potential target in a database of musical works, an important problem in the field of music information retrieval. Similarity metrics are a critical component of "query-by-humming" (QBH) applications which search audio and multimedia databases for strong matches to oral queries. Our model comprehensively expresses the types of error or variation between target and query: cumulative and noncumulative local errors, transposition, tempo and tempo changes, insertions, deletions and modulation. The model is not only expressive, but automatically trainable, or able to learn and generalize from query examples. We present results of simulations, designed to assess the discriminatory potential of the model, and tests with real sung queries, to demonstrate relevance to real-world applications.
microsoft StreamInsight (StreamInsight, for brevity) is a platform for developing and deploying streaming applications that run continuous queries over high-rate streaming events. StreamInsight adopts a temporal strea...
详细信息
microsoft StreamInsight (StreamInsight, for brevity) is a platform for developing and deploying streaming applications. StreamInsight adopts a deterministic stream model that leverages a temporal algebra as the underl...
详细信息
microsoft's sqlserver Web Services Toolkit (WSTK), which is used to build web services for relational databases, is discussed. The toolkit allows construction of XML views of relational data stored in sqlserver ...
详细信息
microsoft's sqlserver Web Services Toolkit (WSTK), which is used to build web services for relational databases, is discussed. The toolkit allows construction of XML views of relational data stored in sqlserver and query/update the relational data through these views. Users can then request this data as XML, and WSTK retrieves relational rowsets from the database, converting them to XML hierarchies on-the-fly transparently to users. The sqlserver Web Services Toolkit lets developers access databases using programming models that are natural to client-side programming languages, allowing databases to be easily converted into web services.
A data mining component is included in microsoft sql server 2000 and sqlserver 2005, one of the most popular DBMSs. This gives a push for data mining technologies to move from a niche towards the mainstream. Apart fr...
详细信息
A data mining component is included in microsoft sql server 2000 and sqlserver 2005, one of the most popular DBMSs. This gives a push for data mining technologies to move from a niche towards the mainstream. Apart from a few algorithms, the main contribution of sqlserver Data Mining is the implementation of OLE DB for Data Mining. OLE DB for Data mining is an industrial standard led by microsoft and supported by a number of ISVs. It leverages two existing relational technologies: sql and OLE DB. It defines a sql language for data mining based on a relational concept. More recently, microsoft, Hyperion, SAS and a few other BI vendors formed the XML for Analysis Council. XML for Analysis covers both OLAP and Data Mining. The goal is to allow consumer applications to query various BI packages from different platforms. This paper gives an overview of OLE DB for Data Mining and XML for Analysis. It also shows how to build data mining application using these APIs.
Cross feature testing is a generic term that refers to testing one or more features together. In this paper, we discuss what cross feature testing in a database system specifically entails. We will identify and explai...
详细信息
ISBN:
(纸本)9781605582337
Cross feature testing is a generic term that refers to testing one or more features together. In this paper, we discuss what cross feature testing in a database system specifically entails. We will identify and explain some of the dependencies amongst feature interactions and how they are categorized. We also review some of the problem symptoms that can occur from cross feature failures. Some strategies to address these issues as they relate to the database are also discussed, but a thorough analysis of any cross feature test solutions is beyond the scope of this paper the goal of which is to provide a basis for future dialog on this topic. Copyright 2008 ACM.
The query optimizer models data distribution and access paths to make the optimal plan choice for a given query. Sometimes the plan selection is poor because of modeling limitations, outdated statistics, incorrect opt...
详细信息
ISBN:
(纸本)9781605582337
The query optimizer models data distribution and access paths to make the optimal plan choice for a given query. Sometimes the plan selection is poor because of modeling limitations, outdated statistics, incorrect optimization heuristics, etc. Hence it is useful to examine the plan choice made by the optimizer from an execution perspective and to impose validation rules on the actual execution plan to evaluate plan suitability. This approach treats the optimizer as a black box. The plan validation is based on the queries and data instead of the optimizer implementation details. This paper describes {XPC}, a rule-based tool for microsoft sql server [1] that helps users and developers achieve a better understanding of plan performance. We apply ideas similar to code profilers [2] to examine plan execution performance along with heuristic rules to the actual execution profile and probe for inefficiencies. This paper describes the overview and implementation of {XPC} and presents rules showing how {XPC} is useful in targeting plan performance issues. Copyright 2008 ACM.
暂无评论