This paper introduces a simple and powerful extension of stratified dataLOG which permits to express various DB-complexity classes. The new language, called dataLOG(s,c,p), extends dataLOG with stratified negation, a ...
详细信息
This paper introduces a simple and powerful extension of stratified dataLOG which permits to express various DB-complexity classes. The new language, called dataLOG(s,c,p), extends dataLOG with stratified negation, a non-deterministic construct, called choice, and a weak form of constraints, called preference rules, that is, constraints that should be respected but, if they cannot be eventually enforced, they only invalidate the portions of the program which they are concerned with. Although dataLOG with stratified negation is not able to express all polynomial time queries,(20)) the introduction of the non-deterministic construct choice permits to express, exactly, the 'deterministic fragment' of the class of DB-queries P, under the non-deterministic semantics, NP, under the possible semantics, and coNP, under the certain semantics. The introduction of preference rules, further increases the expressive power of the language, and permits to express the complexity classes Sigma(2)(P), under the possibility semantics, and Pi(2)(P) under the certainty semantics.
Probability theory is mathematically the best understood paradigm for modeling and manipulating uncertain information. Probabilities of complex events can be computed from those of basic events on which they depend, u...
详细信息
Probability theory is mathematically the best understood paradigm for modeling and manipulating uncertain information. Probabilities of complex events can be computed from those of basic events on which they depend, using any of a number of strategies. Which strategy is appropriate depends very much on the known interdependencies among the events involved. Previous work on probabilistic databases has assumed a fixed and restrictive combination strategy (e.g., assuming all events are pairwise independent). In this article, we characterize, using postulates, whole classes of strategies for conjunction, disjunction, and negation, meaningful from the viewpoint of probability theory. (1) We propose a probabilistic relational data model and a generic probabilistic relational algebra that neatly captures various strategies satisfying the postulates, within a single unified framework. (2) We show that as long as the chosen strategies can be computed in polynomial time, queries in the positive fragment of the probabilistic relational algebra have essentially the same data complexity as classical relational algebra. (3) We establish various containments and equivalences between algebraic expressions, similar in spirit to those in classical algebra. (4) We develop algorithms for maintaining materialized probabilistic views. (5) Based on these ideas, we have developed a prototype probabilistic database system called ProbView on top of Dbase V.0. We validate our complexity results with experiments and show that rewriting certain types of queries to other equivalent forms often yields substantial savings.
The validity of a monadic-second order (MS) expressible property can be checked in linear time on graphs of bounded tree-width or clique-width given with appropriate decompositions. This result is proved by constructi...
详细信息
The validity of a monadic-second order (MS) expressible property can be checked in linear time on graphs of bounded tree-width or clique-width given with appropriate decompositions. This result is proved by constructing from the MS sentence expressing the property and an integer that bounds the tree-width or clique-width of the input graph, a finite automaton intended to run bottom-up on the algebraic term representing a decomposition of the input graph. As we cannot construct practically the transition tables of these automata because they are huge, we use fly-automata whose states and transitions are computed "on the fly", only when needed for a particular input. Furthermore, we allow infinite sets of states and we equip automata with output functions. Thus, they can check properties that are not MS expressible and compute values, for example, the number of p-colorings of a graph. We obtain XP and FPT graph algorithms, parameterized by tree width or clique-width. We show how to construct easily such algorithms by combining predefined automata for basic functions and properties. These combinations reflect the structure of the MS formula that specifies the property to check or the function to compute. (C) 2015 Elsevier B.V. All rights reserved.
Public repositories have contributed to the maturation of experimental methodology in machine learning. Publicly available data sets have allowed researchers to empirically assess their learners and, jointly with open...
详细信息
Public repositories have contributed to the maturation of experimental methodology in machine learning. Publicly available data sets have allowed researchers to empirically assess their learners and, jointly with open source machine learning software, they have favoured the emergence of comparative analyses of learners' performance over a common framework. These studies have brought standard procedures to evaluate machine learning techniques. However, current claims such as the superiority of enhanced algorithms are biased by unsustained assumptions made throughout some praxes. In this paper, the early steps of the methodology, which refer to data set selection, are inspected. Particularly, the exploitation of the most popular data repository in machine learning the UCI repository is examined. We analyse the type, complexity, and use of UCI data sets. The study recommends the design of a mindful data repository, UCI+, which should include a set of properly characterised data sets consisting of a complete and representative sample of real-world problems, enriched with artificial benchmarks. The ultimate goal of the UCI+ is to lay the foundations towards a well-supported methodology for learner assessment. (C) 2013 Elsevier Inc. All rights reserved.
This article is concerned with the handling of inconsistencies occurring in the combination of description logics and rules, especially in hybrid MKNF knowledge bases. More precisely, we present a paraconsistent seman...
详细信息
This article is concerned with the handling of inconsistencies occurring in the combination of description logics and rules, especially in hybrid MKNF knowledge bases. More precisely, we present a paraconsistent semantics for hybrid MKNF knowledge bases (called para-MKNF knowledge bases) based on four-valued logic as proposed by Belnap. We also reduce this paraconsistent semantics to the stable model semantics via a linear transformation operator, which shows the relationship between the two semantics and indicates that the data complexity in our paradigm is not higher than that of classical reasoning. Moreover, we provide fixpoint operators to compute paraconsistent MKNF models, each suitable to different kinds of rules. At last we present the data complexity of instance checking in different para-MKNF knowledge bases.
The PRINCE cipher is the result of a cooperation between the Technical University of Denmark, NXP Semiconductors and the Ruhr University Bochum. The cipher was designed to reach an extremely low-latency encryption and...
详细信息
The PRINCE cipher is the result of a cooperation between the Technical University of Denmark, NXP Semiconductors and the Ruhr University Bochum. The cipher was designed to reach an extremely low-latency encryption and instant response time. PRINCE has already gained a lot of attention from the academic community, however, most of the attacks are theoretical, usually with very high time or data complexity. This work helps to fill the gap in more practically oriented attacks, with more realistic scenarios and complexities. New attacks are presented, up to seven rounds, relying on integral and higher-order differential cryptanalysis.
This work presents a literature review of multiple classifier systems based on the dynamic selection of classifiers. First, it briefly reviews some basic concepts and definitions related to such a classification appro...
详细信息
This work presents a literature review of multiple classifier systems based on the dynamic selection of classifiers. First, it briefly reviews some basic concepts and definitions related to such a classification approach and then it presents the state of the art organized according to a proposed taxonomy. In addition, a two-step analysis is applied to the results of the main methods reported in the literature, considering different classification problems. The first step is based on statistical analyses of the significance of these results. The idea is to figure out the problems for which a significant contribution can be observed in terms of classification performance by using a dynamic selection approach. The second step, based on data complexity measures, is used to investigate whether or not a relation exists between the possible performance contribution and the complexity of the classification problem. From this comprehensive study, we observed that, for some classification problems, the performance contribution of the dynamic selection approach is statistically significant when compared to that of a single-based classifier. In addition, we found evidence of a relation between the observed performance contribution and the complexity of the classification problem. These observations allow us to suggest, from the classification problem complexity, that further work should be done to predict whether or not to use a dynamic selection approach. (C) 2014 Elsevier Ltd. All rights reserved.
Top-down induction of decision trees is a simple and powerful method of pattern classification. In a decision tree, each node partitions the available patterns into two or more sets. New nodes are created to handle ea...
详细信息
Top-down induction of decision trees is a simple and powerful method of pattern classification. In a decision tree, each node partitions the available patterns into two or more sets. New nodes are created to handle each of the resulting partitions and the process continues. A node is considered terminal if it satisfies some stopping criteria (for example, purity, i.e., all patterns at the node are from a single class). Decision trees may be univariate, linear multivariate, or nonlinear multivariate depending on whether a single attribute, a linear function of all the attributes, or a nonlinear function of all the attributes is used for the partitioning at each node of the decision tree. Though nonlinear multivariate decision trees are the most powerful, they are more susceptible to the risks of overfitting. In this paper, we propose to perform model selection at each decision node to build omnivariate decision trees. The model selection is done using a novel classifiability measure that captures the possible sources of misclassification with relative ease and is able to accurately reflect the complexity of the subproblem at each node. The proposed approach is fast and does not suffer from as high a computational burden as that incurred by typical model selection algorithms. Empirical results over 26 data sets indicate that our approach is faster and achieves better classification accuracy compared to statistical model select algorithms.
The excellence of a given learner is usually claimed through a performance comparison with other learners over a collection of data sets. Too often, researchers are not aware of the impact of their data selection on t...
详细信息
The excellence of a given learner is usually claimed through a performance comparison with other learners over a collection of data sets. Too often, researchers are not aware of the impact of their data selection on the results. Their test beds are small, and the selection of the data sets is not supported by any previous data analysis. Conclusions drawn on such test beds cannot be generalised, because particular data characteristics may favour certain learners unnoticeably. This work raises these issues and proposes the characterisation of data sets using complexity measures, which can be helpful for both guiding experimental design and explaining the behaviour of learners. (C) 2012 Elsevier Ltd. All rights reserved.
In machine learning, the performance of a classifier depends on both the classifier model and the separability/complexity of datasets. To quantitatively measure the separability of datasets, in this study, we propose ...
详细信息
In machine learning, the performance of a classifier depends on both the classifier model and the separability/complexity of datasets. To quantitatively measure the separability of datasets, in this study, we propose an intrinsic measure - the Distance-based Separability Index (DSI), which is independent of the classifier model. We then formally show that the DSI can indicate whether the distributions of datasets are identical for any dimensionality. DSI can measure separability of datasets because we consider the situation in which different classes of data are mixed in the same distribution to be the most difficult for classifiers to separate. And, DSI is verified to be an effective separability measure by comparing it to state-of-the-art separability/complexity measures using synthetic datasets and real datasets (CIFAR-10/100). Having demonstrated the DSI's ability to compare distributions of samples, our other studies show that it can be used in other separability-based applications, such as measuring the performance of generative adversarial networks (GANs) and evaluating the results of clustering methods.
暂无评论