We propose a rigorous analysis approach for the subset-sum problem in the context of lossless data compression, where the phase transition of the subset-sum problem is directly related to the passage between ambiguous...
详细信息
We propose a rigorous analysis approach for the subset-sum problem in the context of lossless data compression, where the phase transition of the subset-sum problem is directly related to the passage between ambiguous and non-ambiguous decompression, for a compression scheme that is based on specifying the sequence composition. The proposed analysis lends itself to straightforward extensions in several directions of interest, including non-binary alphabets, incorporation of side information at the decoder (Slepian-Wolf coding), and coding schemes based on multiple subset sums. It is also demonstrated that the proposed technique can be used to analyze the critical behavior in a more involved situation where the sequence composition is not specified by the encoder.
A signal model called joint sparse model 2 (JSM-2) or the multiple measurement vector problem, in which all sparse signals share their support, is important for dealing with practical signal processing problems. In th...
详细信息
A signal model called joint sparse model 2 (JSM-2) or the multiple measurement vector problem, in which all sparse signals share their support, is important for dealing with practical signal processing problems. In this paper, we investigate the typical reconstruction performance of noisy measurement JSM-2 problems for l(2,1)-norm regularized least square reconstruction and the Bayesian optimal reconstruction scheme in terms of mean square error. Employing the replica method, we show that these schemes, which exploit the knowledge of the sharing of the signal support, can recover the signals more precisely as the number of channels increases. In addition, we compare the reconstruction performance of two different ensembles of observation matrices: one is composed of independent and identically distributed random Gaussian entries and the other is designed so that row vectors are orthogonal to one another. As reported for the single-channel case in earlier studies, our analysis indicates that the latter ensemble offers better performance than the former ones for the noisy JSM-2 problem. The results of numerical experiments with a computationally feasible approximation algorithm we developed for this study agree with the theoretical estimation.
In this paper we exploit concepts of information theory to address the fundamental problem of identifying and de. ning the most suitable tools for extracting, in a automatic and agnostic way, information from a generi...
详细信息
In this paper we exploit concepts of information theory to address the fundamental problem of identifying and de. ning the most suitable tools for extracting, in a automatic and agnostic way, information from a generic string of characters. We introduce in particular a class of methods which use in a crucial way data compression techniques in order to de. ne a measure of remoteness and distance between pairs of sequences of characters ( e. g. texts) based on their relative information content. We also discuss in detail how specific features of data compression techniques could be used to introduce the notion of dictionary of a given sequence and of artificial text and we show how these new tools can be used for information extraction purposes. We point out the versatility and generality of our method that applies to any kind of corpora of character strings independently of the type of coding behind them. We consider as a case study linguistic motivated problems and we present results for automatic language recognition, authorship attribution and self-consistent classification.
暂无评论