predictive coding (PC) is a leading theory of cortical function that has previously been shown to explain a great deal of neurophysiological and psychophysical data. Here it is shown that PC can perform almost exact B...
详细信息
predictive coding (PC) is a leading theory of cortical function that has previously been shown to explain a great deal of neurophysiological and psychophysical data. Here it is shown that PC can perform almost exact Bayesian inference when applied to computing with population codes. It is demonstrated that the proposed algorithm, based on PC, can: decode probability distributions encoded as noisy population codes;combine priors with likelihoods to calculate posteriors;perform cue integration and cue segregation;perform function approximation;be extended to perform hierarchical inference;simultaneously represent and reason about multiple stimuli;and perform inference with multi-modal and non-Gaussian probability distributions. PC thus provides a neural network-based method for performing probabilistic computation and provides a simple, yet comprehensive, theory of how the cerebral cortex performs Bayesian inference.
Typically, unsupervised segmentation of speech into the phone- and wordlike units are treated as separate tasks and are often done via different methods which do not fully leverage the inter-dependence of the two task...
详细信息
Typically, unsupervised segmentation of speech into the phone- and wordlike units are treated as separate tasks and are often done via different methods which do not fully leverage the inter-dependence of the two tasks. Here, we unify them and propose a technique that can jointly perform both, showing that these two tasks indeed benefit from each other. Recent attempts employ self-supervised learning, such as contrastive predictive coding (CPC), where the next frame is predicted given past context. However, CPC only looks at the audio signal's frame-level structure. We overcome this limitation with a segmental contrastive predictive coding (SCPC) framework to model the signal structure at a higher level, e.g., phone level. A convolutional neural network learns frame-level representation from the raw waveform via noise-contrastive estimation (NCE). A differentiable boundary detector finds variable-length segments, which are then used to optimize a segment encoder via NCE to learn segment representations. The differentiable boundary detector allows us to train frame-level and segment-level encoders jointly. Experiments show that our single model outperforms existing phone and word segmentation methods on TIMIT and Buckeye datasets. Finally, we use SCPC to extract speech features at the segment level rather than at the uniformly spaced frame level (e.g., 10 ms) and produce variable rate representations that change according to the contents of the utterance. We lower the feature extraction rate from the typical 100 Hz to 14.5 Hz on average while still outperforming the hand-crafted features such as MFCC on the linear phone classification task.
Federated learning can enable remote workers to collaboratively train a shared machine learning model while allowing training data to be kept locally. In the use case of wireless mobile devices, the communication over...
详细信息
Federated learning can enable remote workers to collaboratively train a shared machine learning model while allowing training data to be kept locally. In the use case of wireless mobile devices, the communication overhead is a critical bottleneck due to limited power and bandwidth. Prior work has utilized various data compression tools such as quantization and sparsification to reduce the overhead. In this paper, we propose a predictive coding based compression scheme for federated learning. The scheme has shared prediction functions among all devices and allows each worker to transmit a compressed residual vector derived from the reference. In each communication round, we select the predictor and quantizer based on the rate-distortion cost, and further reduce the redundancy with entropy coding. Extensive simulations reveal that the communication cost can be reduced up to 99% with even better learning performance when compared with other baseline methods.
A vector quantization (VQ) scheme is proposed for image coding to achieve high compression ratios in visual communication applications. For still images, an adaptive tree structured vector quantization is proposed, an...
详细信息
A vector quantization (VQ) scheme is proposed for image coding to achieve high compression ratios in visual communication applications. For still images, an adaptive tree structured vector quantization is proposed, and the picture quality of the reconstructed image can be adjusted accordingly by changing the threshold value. For image sequences, motion compensation is employed to reduce the variance of input vectors. Then vector quantizers are designed for prediction errors and utilized to reduce bit rate and to improve the reconstructed image quality. Studies of using subband VQ with motion compensation are also conducted, and satisfactory results are obtained in implementation complexity and compression ratios.
This paper incorporates trellis coded vector quantization (TCVQ) and forward adaptive predictive coding (APC) to form an efficient speech coding system operating at bit rates of 16 and 9.6 kb/s. The effectiveness of t...
详细信息
This paper incorporates trellis coded vector quantization (TCVQ) and forward adaptive predictive coding (APC) to form an efficient speech coding system operating at bit rates of 16 and 9.6 kb/s. The effectiveness of the system is studied for a variety of system parameters and utterances. Simulation results indicate that segmental signal-to-noise ratios as high as 23.8 and 15.4 dB are obtainable at 16 and 9.6 kb/s, respectively. The quality of the reconstructed speech is deemed to be excellent at 16 kb/s and very good at 9.6 kb/s. An algorithm for "optimizing" the residual codebooks is also presented.
Binary tree predictive coding uses a noncausal, shape-adaptive predictor to decompose an image into a binary tree of prediction errors and zero blocks. Fast compression performance is comparable with Joint Photographe...
详细信息
Binary tree predictive coding uses a noncausal, shape-adaptive predictor to decompose an image into a binary tree of prediction errors and zero blocks. Fast compression performance is comparable with Joint Photographers Expert Group (JPEG) for photographs, with GIF for graphics, and superior to the state of the art for composite images.
作者:
GALAND, CRMENEZ, JEROSSO, MMUNIV NICE
SOPHIA ANTIPOLISFRANCE LASSY
CNRSURA 1376EQUIPE 135F-06041 NICEFRANCE IBM CORP
THOMAS J WATSON RES CTRDEPT COMP SCIYORKTOWN HTSNY 10598 IBM CORP
DEPT ARCHITECTURE & TELECOMMUNRES TRIANGLE PKNC
Since its recent introduction by Atal and Schroeder, the code excited linear predictive (CELP) coder has been thoroughly and widely studied by the speech coding research community, and has already been adapted to seve...
详细信息
Since its recent introduction by Atal and Schroeder, the code excited linear predictive (CELP) coder has been thoroughly and widely studied by the speech coding research community, and has already been adapted to several standards for telephone speech coding. The CELP algorithm represents a breakthrough in speech coding, for it encodes telephone quality speech at 8 kb/s without noticeable distortion. Previously, this performance was achievable by coders operating at 16 kb/s or higher. However, the drawback of the CELP is its inherent complexity, which, despite the fast progress of the technology, may represent a problem of cost or feasibility in products. In this paper, we discuss a new way to consider the CELP concept, which allows us to cut the processing load while keeping the same speech quality. Rather than performing the individual weighting of each candidate sequence, we propose a global implementation of the perceptual weighting function at the codebook level. As a result, the analysis-by-synthesis procedure does not require the processing of all the candidate sequences through the synthesis and weighting filters, and therefore the complexity requirement of the algorithm is much reduced. The new concept is carried out with an adaptive codebook. We report on two fixed-point implementations of our adaptive CELP (ACELP) algorithm: a 7.2 kb/s block coder (7 MIPS), and a 12 kb/s low-delay coder (11 MIPS). Both coders have been rated to provide the same quality as the 13 kb/s block coder adopted by the GSM for the European cellular telephone.
The reversible image steganographic scheme in this study provides the ability to embed secret data into a host image and then recover the host image without losing any information when the secret data is extracted. In...
详细信息
The reversible image steganographic scheme in this study provides the ability to embed secret data into a host image and then recover the host image without losing any information when the secret data is extracted. In this paper, a reversible image steganographic scheme based on predictive coding is proposed by embedding secret data into compression codes during the lossless image compression. The proposed scheme effectively provides a lossless hiding mechanism in the compression domain. During the predictive coding stage, the proposed scheme embeds secret data into error values by referring to a hiding-tree. In an entropy decoding stage, the secret data can be extracted by referring to the hiding-tree, and the host image can be recovered during the predictive decoding stage. The experimental results show that the average hiding capacity of the proposed scheme is 0.992 bits per pixel (bpp), and the host image can be reconstructed without losing any information when the secret data is extracted. (C) 2009 Elsevier B.V. All rights reserved.
Explicit expressions are derived for the conditional expectation and variance of the encoder in a predictive DPCM coder with N-level quantizer, when a stationary Ornstein-Uhlenbeck process is the source. A representat...
详细信息
Explicit expressions are derived for the conditional expectation and variance of the encoder in a predictive DPCM coder with N-level quantizer, when a stationary Ornstein-Uhlenbeck process is the source. A representation of the encoder in terms of a stochastic integral is presented. These expressions yield a nonlinear stochastic difference equation for the decoding error process and a stochastic differential equation (SDE) as a weak limit for the error process. The statistical properties of the error obtained as a solution of the limiting SDE are interpreted in terms of the slope overload error.
The human visual system (HVS) is a hierarchical system, in which visual signals are processed hierarchically. In this paper, the HVS is modeled as a three-level communication system and visual perception is divided in...
详细信息
The human visual system (HVS) is a hierarchical system, in which visual signals are processed hierarchically. In this paper, the HVS is modeled as a three-level communication system and visual perception is divided into three stages according to the hierarchical predictive coding theory. Then, a novel just noticeable distortion (JND) estimation scheme is proposed. In visual perception, the input signals are predicted constantly and spontaneously in each hierarchy, and neural response is evoked by the central residue and inhibited by surrounding residues. These two types' residues are regarded as the positive and negative visual incentives which cause positive and negative perception effects, respectively. In neuroscience, the effect of incentive on observer is measured by the surprise of this incentive. Thus, we propose a surprise-based measurement method to measure both perception effects. Specifically, considering the biased competition of visual attention, we define the product of the residue self-information (i.e., surprise) and the competition biases as the perceptual surprise to measure the positive perception effect. As for the negative perception effect, it is measured by the average surprise (i.e., the local Shannon entropy). The JND threshold of each stage is estimated individually by considering both perception effects. The total JND threshold is finally obtained by non-linear superposition of three stage thresholds. Furthermore, the proposed JND estimation scheme is incorporated into the codec of Versatile Video coding for image compression. Experimental results show that the proposed JND model outperforms the relevant existing ones, and over 16% of bit rate can be reduced without jeopardizing the perceptual quality.
暂无评论