Initially introduced by Peter Hammer, logical analysis of data (LAD) is a methodology that aims at computing a logical justification for dividing a group of data into two groups of observations, usually called the pos...
详细信息
Initially introduced by Peter Hammer, logical analysis of data (LAD) is a methodology that aims at computing a logical justification for dividing a group of data into two groups of observations, usually called the positive and negative groups. Let us consider this partition into positive and negative groups as the description of a partially defined Boolean function;the data are then processed to identify a subset of attributes, whose values may be used to characterize the observations of the positive groups against those of the negative group. LAD constitutes an interesting rule-based learning alternative to classic statistical learning techniques and has many practical applications. Nevertheless, the computation of group characterization may be costly, depending on the properties of the data instances. A major aim of our work is to provide effective tools for speeding up the computations, by computing some a priori probability that a given set of attributes does characterize the positive and negative groups. To this effect, we propose several models for representing the data set of observations, according to the information we have on it. These models, and the probabilities they allow us to compute, are also helpful for quickly assessing some properties of the real data at hand;furthermore, they may help us to better analyze and understand the computational difficulties encountered by solving methods. Once our models have been established, the mathematical tools for computing probabilities come from analytic combinatorics. They allow us to express the desired probabilities as ratios of generating function coefficients, which then provide a quick computation of their numerical values. A further, long-range goal of this paper is to show that the methods of analytic combinatorics can help in analyzing the performance of various algorithms in LAD and related fields.
Multitudinous probabilistic and combinatorial objects are associated with generating functions satisfying a composition scheme F(z) = G(H(z)). The analysis becomes challenging when this scheme is critical (i.e., G and...
详细信息
Multitudinous probabilistic and combinatorial objects are associated with generating functions satisfying a composition scheme F(z) = G(H(z)). The analysis becomes challenging when this scheme is critical (i.e., G and H are simultaneously singular). Motivated by many examples (random mappings, planar maps, directed lattice paths), we consider a natural extension of this scheme, namely F(z, u) = G(uH(z))M(z). We also consider a variant of this scheme, which allows us to analyse the number of H-components of a given size in F. We prove that these two models lead to a rich world of limit laws, where we identify the key role played by a new universal law introduced in this article: the three-parameter Mittag-Leffler distribution, which is essentially the product of a beta and a Mittag-Leffler distribution. We also prove (double) phase transitions, additionally involving Boltzmann and mixed Poisson distributions, bringing a unified explanation of the associated thresholds. In all cases we obtain moment convergence and local limit theorems. We end with extensions of the critical composition scheme to a cycle scheme and to the multivariate case, leading to product distributions. Applications are presented for random walks, trees (supertrees of trees, increasingly labelled trees, preferential attachment trees), triangular P & oacute;lya urns, and the Chinese restaurant process.
We consider a serialized coin-tossing leader election algorithm that proceeds in rounds until a winner is chosen, or all contestants are eliminated. The analysis allows for either biased or fair coins. We find the exa...
详细信息
We consider a serialized coin-tossing leader election algorithm that proceeds in rounds until a winner is chosen, or all contestants are eliminated. The analysis allows for either biased or fair coins. We find the exact distribution for the duration of any fixed contestant;asymptotically, it turns out to be a geometric distribution. Rice's method (an analytic technique) shows that the moments of the duration contain oscillations, which we give explicitly for the mean and variance. We also use convergence in the Wasserstein metric space to show that the distribution of the total number of coin flips (among all participants), suitably normalized, approaches a normal limiting random variable.
We analyze extremal statistics in non-crossing configurations on the n vertices of a convex polygon. We prove that the maximum degree and the largest component are of logarithmic order, and that, suitably scaled, they...
详细信息
We analyze extremal statistics in non-crossing configurations on the n vertices of a convex polygon. We prove that the maximum degree and the largest component are of logarithmic order, and that, suitably scaled, they converge to a well-defined constant. We also prove that the diameter is of order root n. The proofs are based on singularity analysis, an application of the first and second moment method, and on the analysis of iterated functions. (C) 2014 Elsevier B.V. All rights reserved.
In this article, we study the impact of applying simple reduction rules to random syntactic formulas encoded as trees. We assume that there is an operator that has an absorbing pattern and prove that if we use this pr...
详细信息
In this article, we study the impact of applying simple reduction rules to random syntactic formulas encoded as trees. We assume that there is an operator that has an absorbing pattern and prove that if we use this property to simplify a uniform random expression with n nodes, then the expected size of the result is bounded by a constant. The same holds for higher moments, establishing the lack of expressivity of uniform random expressions. Our framework is quite general as we consider expressions defined by systems of combinatorial equations. For our proofs, we rely on Drmota's multidimensional theorem for systems of generating functions.
We use analytic methods to study the probability of a family of motifs not occurring on the fringe of a random recursive tree. We obtain an asymptotic formula for this probability bymeans of singularity analysis. Two ...
详细信息
We use analytic methods to study the probability of a family of motifs not occurring on the fringe of a random recursive tree. We obtain an asymptotic formula for this probability bymeans of singularity analysis. Two regimes are treated in particular: the case that a fixed proportion of motifs of size. is forbidden, and the case that a fixed number of motifs of size. is forbidden. In both cases, we observe phase transitions as the size of the random tree and the size of the motif tend to infinity. The required asymptotic expansions of the dominant singularities were first found by computer experiments and only later made rigorous.
The relationship between a stable multivariable polynomial p(z) and the Fourier coefficients of its spectral density function 1/vertical bar p(z)vertical bar(2), is further investigated. In this paper we focus on the ...
详细信息
The relationship between a stable multivariable polynomial p(z) and the Fourier coefficients of its spectral density function 1/vertical bar p(z)vertical bar(2), is further investigated. In this paper we focus on the radial asymptotics of the Fourier coefficients for a specific choice of a two variable polynomial. Hypergeometric functions appear in the analysis, and new results are derived for these as well.
We present counting methods for some special classes of multivariate polynomials over a finite field, namely, the reducible ones, the s-powerful ones (divisible by the sth power of a nonconstant polynomial), and the r...
详细信息
We present counting methods for some special classes of multivariate polynomials over a finite field, namely, the reducible ones, the s-powerful ones (divisible by the sth power of a nonconstant polynomial), and the relatively irreducible ones (irreducible but reducible over an extension field). One approach employs generating functions, and another one uses a combinatorial method. They yield exact formulas and approximations with relative errors that essentially decrease exponentially in the input size.
In this paper, the relation between the Glushkov automaton (A(pos)) and the partial derivative automaton (A(pd)) of a given regular expression, in terms of transition complexity, is studied. The average transition com...
详细信息
In this paper, the relation between the Glushkov automaton (A(pos)) and the partial derivative automaton (A(pd)) of a given regular expression, in terms of transition complexity, is studied. The average transition complexity of A(pos) was proved by Nicaud to be linear in the size of the corresponding expression. This result was obtained using an upper bound of the number of transitions of A(pos). Here we present a new quadratic construction of A(pos) that leads to a more elegant and straightforward implementation, and that allows the exact counting of the number of transitions. Based on that, a better estimation of the average size is presented. Asymptotically, and as the alphabet size grows, the number of transitions per state is on average 2. Broda et al. computed an upper bound for the ratio of the number of states of A(pd) to the number of states of A(pos), which is about 1/2 for large alphabet sizes. Here we show how to obtain an upper bound for the number of transitions in A(pd), which we then use to get an average case approximation. In conclusion, assymptotically, and for large alphabets, the size of Apd is half the size of the . This is corroborated by some experiments, even for small alphabets and small regular expressions.
In this article we study the effect of simple semantic reductions on random BST-like expression-trees. Such random unary-binary expression-trees are often used in benchmarks for model-checking tools. We consider the r...
详细信息
ISBN:
(纸本)9783959771801
In this article we study the effect of simple semantic reductions on random BST-like expression-trees. Such random unary-binary expression-trees are often used in benchmarks for model-checking tools. We consider the reduction induced by an absorbing pattern for some given operator., which we apply bottom-up, producing an equivalent (and smaller) tree-expression. Our main result concerns the expected size of a random tree, of given input size n -> infinity, after reduction. We show that there are two different thresholds, leading to a total of five regimes, ranging from no significant reduction at all, to almost complete reduction. These regimes are completely characterized according to the probability of the absorbing operator. Our results prove that random BST-like trees have to be considered with care, and that they offer a richer range of behaviours than uniform random trees.
暂无评论