The problem of sequential probability assignment for individual sequences is investigated. We compare the probabilities assigned by any sequential scheme to the performance of the best ''batch'' scheme...
详细信息
The problem of sequential probability assignment for individual sequences is investigated. We compare the probabilities assigned by any sequential scheme to the performance of the best ''batch'' scheme (model) in some class. For the class of finite-state schemes and other related families, we derive a deterministic performance bound, analogous to the classical (probabilistic) Minimum Description Length (MDL) bound. It holds for ''most'' sequences, similarly to the probabilistic setting, where the bound holds for ''most'' sources in a class. It is shown that the bound can be attained both pointwise and sequentially for any model family in the reference class and without any prior knowledge of its order. This is achieved by a universal scheme based on a mixing approach. The bound and its sequential achievability establish a completely deterministic significance to the concept of predictive MDL.
We consider finite-alphabet and real-valued time series and the following four problems: i) estimation of the (limiting) probability P(x(0) ... x(s)) for every s and each sequence x(0) ... x(s) of letters from the pro...
详细信息
We consider finite-alphabet and real-valued time series and the following four problems: i) estimation of the (limiting) probability P(x(0) ... x(s)) for every s and each sequence x(0) ... x(s) of letters from the process alphabet (or estimation of the density p(x(0), ... , x(s)) for real-valued time series), ii) the so-called on-line prediction, where the conditional probability P(x(t+1)vertical bar x(1)x(2) ... x(t)) (or the conditional density p(x(t+1)vertical bar x(1)x(2) ... x(t))) should be estimated, where x(1)x(2) ... x(t) are given, iii) regression and iv) classification (or so-called problems with side information). We show that Kolmogorov complexity (KC) and universal codes (or universal data compressors), whose codeword length can be considered as an estimation of KC, can be used as a basis for constructing asymptotically optimal methods for the above problems. (By definition, a universal code can "compress" any sequence generated by a stationary and ergodic source asymptotically to the Shannon entropy of the source).
In this paper an irreducible parameterization for a finite memory source is constructed in the form of a tree machine. A universal information source for the set of finite memory sources is constructed by a predictive...
详细信息
In this paper an irreducible parameterization for a finite memory source is constructed in the form of a tree machine. A universal information source for the set of finite memory sources is constructed by a predictive modification of an earlier-studied algorithm-Context. It is shown that this universal source incorporates any minimal data-generating tree machine in an asymptotically optimal manner in the following sense: the negative logarithm of the probability it assigns to any long typical sequence, generated by any tree machine, approaches that assigned by the tree machine at the best possible rate.
We propose a new kernel for strings which borrows ideas and techniques from information theory and data compression. This kernel can be used in combination with any kernel method, in particular Support Vector Machines...
详细信息
We propose a new kernel for strings which borrows ideas and techniques from information theory and data compression. This kernel can be used in combination with any kernel method, in particular Support Vector Machines for string classification. with notable applications in proteomics. By using a Bayesian averaging framework with conjugate priors on a class of Markovian models known as probabilistic suffix trees or context-trees, we compute the value of this kernel in linear time and space while only using the information contained in the spectrum of the considered strings. This is ensured through an adaptation of a compression method known as the context-tree weighting algorithm. Encouraging classification results are reported on a standard protein homology detection experiment, showing that the context-tree kernel performs well with respect to other state-of-the-art methods while using no biological prior knowledge. (c) 2005 Elsevier Ltd. All rights reserved.
This paper concerns the rates of power law growth of mutual information computed for a stationary measure or for a universal code. The rates are called Hilberg exponents, and four such quantities are defined for each ...
详细信息
This paper concerns the rates of power law growth of mutual information computed for a stationary measure or for a universal code. The rates are called Hilberg exponents, and four such quantities are defined for each measure and each code: two random exponents and two expected exponents. A particularly interesting case arises for the conditional algorithmic mutual information. In this case, the random Hilberg exponents are almost surely constant on ergodic sources and are bounded by the expected Hilberg exponents. This property is the second-order analog of the Shannon-McMillan-Breiman theorem, proved without invoking the ergodic theorem. It carries over to Hilberg exponents for the underlying probability measure via Shannon-Fano coding and Barron inequality. Moreover, the expected Hilberg exponents can be linked for different universal codes. Namely, if one code dominates another, the expected Hilberg exponents are greater for the former than for the latter. This paper is concluded by an evaluation of Hilberg exponents for certain sources, such as the mixture Bernoulli process and the Santa Fe processes.
The problem of predicting the next outcome of an individual binary sequence under the constraint that the universal predictor has a finite memory, is explored. In this analysis, the finite-memory universal predictors ...
详细信息
The problem of predicting the next outcome of an individual binary sequence under the constraint that the universal predictor has a finite memory, is explored. In this analysis, the finite-memory universal predictors are either deterministic or random time-invariant finite-state (FS) machines with K states (K-state machines). The paper provides bounds on the asymptotic achievable regret of these constrained universal predictors as a function of K, the number of their states, for long enough sequences. The specific results are as follows. When the universal predictors are deterministic machines, the comparison class consists of constant predictors, and prediction is with respect to the 0-1 loss function (Hamming distance), we get tight bounds indicating that the optimal asymptotic regret is 1/(2K). In that case of K-state deterministic universal predictors, the constant predictors comparison class, but prediction is with respect to the self-information (code length) and the square-error loss functions, we show an upper bound on the regret (coding redundancy) of O(K-2/3) and a lower bound of Theta(K-4/5). For these loss functions, if the predictor is allowed to be a random K-state machine, i.e., a machine with random state transitions, we get a lower bound of Theta((1)/(K)) on the regret, with a matching upper bound of O ((1)/(K)) for the square-error loss, and an upper bound of O ((log K)/(K))(1) for the self-information loss. In addition, we provide results for all these loss functions in the case where the comparison class consists of all predictors that are order-L Markov machines.
The nascent field of compressed sensing is founded on the fact that high-dimensional signals with simple structure can be recovered accurately from just a small number of randomized samples. Several specific kinds of ...
详细信息
The nascent field of compressed sensing is founded on the fact that high-dimensional signals with simple structure can be recovered accurately from just a small number of randomized samples. Several specific kinds of structures have been explored in the literature, from sparsity and group sparsity to low-rankness. However, two fundamental questions have been left unanswered. What are the general abstract meanings of structure and simplicity? Do there exist universal algorithms for recovering such simple structured objects from fewer samples than their ambient dimension? In this paper, we address these two questions. Using algorithmic information theory tools such as the Kolmogorov complexity, we provide a unified definition of structure and simplicity. Leveraging this new definition, we develop and analyze an abstract algorithm for signal recovery motivated by Occam's Razor. Minimum complexity pursuit (MCP) requires approximately 2 kappa randomized samples to recover a signal of complexity. and ambient dimension n. We also discuss the performance of the MCP in the presence of measurement noise and with approximately simple signals.
universal compression of patterns of sequences generated by independent and identically distributed (i.i.d.) sources with unknown. possibly large, alphabets is investigated. A pattern is a sequence of indices that con...
详细信息
universal compression of patterns of sequences generated by independent and identically distributed (i.i.d.) sources with unknown. possibly large, alphabets is investigated. A pattern is a sequence of indices that contains all consecutive indices in increasing order of first occurrence. If the alphabet of a source that generated a sequence is unknown, the inevitable cost of coding the unknown alphabet symbols can be exploited to create the pattern of the sequence. This pattern can in turn be compressed by itself. It is shown that if the alphabet size k is essentially small, then the average minimax and maximin redundancies as well as the redundancy of every code for almost every source, when compressing a pattern, consist of at least 0.5 log(n/k(3)) bits per each unknown probability parameter, and if all alphabet letters are likely to occur, there exist codes whose redundancy is at most 0.5 log(n/k(2)) bits per each unknown probability parameter, where n is the length of the data sequences. Otherwise, if the alphabet is large, these redundancies are essentially at least Theta(n(-2/3)) bits per symbol, and there exist codes that achieve redundancy of O(n(-1/2)) bits per symbol. Two suboptimal low-complexity sequential algorithms for compression of patterns are presented and their description lengths analyzed, also pointing out that the pattern average universal description length can decrease below the underlying i.i.d. entropy for large enough alphabets.
We investigate on-line prediction of individual sequences. Given a class of predictors, the goal is to predict as well as the best predictor in the class, where the loss is measured by the self information (logarithmi...
详细信息
We investigate on-line prediction of individual sequences. Given a class of predictors, the goal is to predict as well as the best predictor in the class, where the loss is measured by the self information (logarithmic) loss function. The excess loss (regret) is closely related to the redundancy of the associated lossless universal code. Using Shtarkov's theorem and tools from empirical process theory, we prove a general upper bound on the best possible (minimax) regret. The bound depends on certain metric properties of the class of predictors. We apply the bound to both parametric and nonparametric classes of predictors. Finally, we point out a suboptimal behavior of the popular Bayesian weighted average algorithm.
We construct new self-dual and isodual codes over the integers module 4. The binary images of these codes under the Gray map are nonlinear, but formally self-dual. The construction involves Hensel lifting of binary cy...
详细信息
We construct new self-dual and isodual codes over the integers module 4. The binary images of these codes under the Gray map are nonlinear, but formally self-dual. The construction involves Hensel lifting of binary cyclic codes. Quaternary quadratic residue codes are obtained by Hensel lifting of the classical binary quadratic residue codes. Repeated Hensel lifting produces a universal code defined over the 2-adic integers. We investigate the connections between this universal code and the codes defined over Z/sub 4/, the composition of the automorphism group, and the structure of idempotents over Z/sub 4/. We also derive a square root bound on the minimum Lee weight, and explore the connections with the finite Fourier transform. Certain self-dual codes over Z/sub 4/ are shown to determine even unimodular lattices, including the extended quadratic residue code of length q+1, where q/spl equiv/-1(mod8) is a prime power. When q=23, the quaternary Golay code determines the Leech lattice in this way. This is perhaps the simplest construction for this remarkable lattice that is known.
暂无评论