The application of adaptive vector predictive coding for very low-bit-rate speech encoders was investigated and the performance quantified. The input speech samples were analyzed using the government standard LPC-10 a...
详细信息
The application of adaptive vector predictive coding for very low-bit-rate speech encoders was investigated and the performance quantified. The input speech samples were analyzed using the government standard LPC-10 algorithm to generate a vector with 10 LPC coefficients. This LPC vector was used as the input for vector prediction. Three prediction schemes: time-invariant prediction switched prediction and continuous prediction were investigated. The assumption of independence among the LPC coefficients was also studied. The results indicated the performance of time-invariant and independent prediction schemes is satisfactory for very-low-bit rate speech encoding applications.
This paper presents an adaptive non-linear method for the predictive coding of images using multilayer perceptrons. By incorporating causal and localised training on the actual data being coded, rather than training s...
详细信息
This paper presents an adaptive non-linear method for the predictive coding of images using multilayer perceptrons. By incorporating causal and localised training on the actual data being coded, rather than training separate data, the network weights are continuously updated. This results in a highly adaptive predictor, with localised optimisation based on the stochastic gradient learning. The causal nature of the training means no transmission overhead is required and also enables lossless coding of the images. In addition to the adaptive prediction, the results presented here also incorporate an arithmetic coding scheme, producing results which are better than CALIC and comparable to TMW, the state of the art lossless compression in the literature. This shows that near-optimal results can be obtained with the fundamental concept of adaptive training. The use of a neural network provides a simple means for performing this training.
Adaptive predictive coding of digitized images using multiplicative autoregressive (MAR) models is discussed. Three MAR models, designated as nonsymmetric half plane (NSHP) (3*3), quarter plane (QP) (2*3), and NSHP (2...
详细信息
Adaptive predictive coding of digitized images using multiplicative autoregressive (MAR) models is discussed. Three MAR models, designated as nonsymmetric half plane (NSHP) (3*3), quarter plane (QP) (2*3), and NSHP (2*3), are studied in detail. Results demonstrate that both NSHP (3*3) and QP (2*3) are very effective for coding and transmission of such images at bit rates less than one bit per pixel. Comparison with a 2-D model that has a quarter plane 2*2 region of support indicates that the performance of NSHP (3*3) and QP (2*3) either exceeds or matches that of the former. The proposed scheme has the following advantages. First, the signal-to noise ratio and the bit rate attainable with this method are comparable to those of two-dimensional (2-D) predictive techniques. Second, unlike the 2-D schemes, the stability of the predictive coder is easily guaranteed.< >
This paper presents a hybrid scheme for lossless compression of the X-ray non-destructive testing (NDT) images of aircraft components. The method combines predictive coding and integer wavelet transform (IWT). Further...
详细信息
This paper presents a hybrid scheme for lossless compression of the X-ray non-destructive testing (NDT) images of aircraft components. The method combines predictive coding and integer wavelet transform (IWT). Furthermore, with the aid of component CAD models to divide the X-ray images of aircraft components into different regions based on the material structures, the design of the predictors and the choice of the IWT are optimised according to the specific image features contained in each region having the same material structure. Using a real X-ray image of a practical aircraft component, the proposed hybrid scheme is presented and shown to offer a significantly higher compression ratio than other lossless compression schemes.
An efficient scalable predictive coding method is proposed for the Wyner-Ziv problem, using nested lattice quantization followed by multi-layer Slepian-Wolf coders (SWC) with layered side information. The proposed cod...
详细信息
An efficient scalable predictive coding method is proposed for the Wyner-Ziv problem, using nested lattice quantization followed by multi-layer Slepian-Wolf coders (SWC) with layered side information. The proposed coder can support embedded representation and high coding efficiency by exploiting the high quality version of the previous frame in the enhancement-layer coding of the current frame. Specifically, the decoder generates the enhancement-layer side information with an estimation approach to take into account all the available information to the enhancement layer. On the other hand, a practical switching algorithm is applied at the encoder to simplify the correlation estimation on the channel code design by assuming either the current reconstructed base-layer frame or prior enhancement-layer reconstruction as side information. Experiments based on a DPCM model show great benefits to the enhancement layer reconstruction. The paper also discusses the possible adaptation of this approach to practical video compression.
In this paper we present a fuzzy logic based nonlinear predictor for predictive coding of images. We define five local structure patterns of images: uniform area, horizontal contour (0/spl deg/), vertical contour (90/...
详细信息
ISBN:
(纸本)0780329120
In this paper we present a fuzzy logic based nonlinear predictor for predictive coding of images. We define five local structure patterns of images: uniform area, horizontal contour (0/spl deg/), vertical contour (90/spl deg/), 45/spl deg/ and 135/spl deg/ diagonal contours. Their membership functions are derived with the gradient-based edge detection method, and predicted values for different patterns are defined by linear extrapolation from available neighborhood pixel values. The predicted value of the current pixel can be obtained based on the membership functions and the defined predicted values for the different patterns. A set of parameters to characterize the proposed fuzzy predictor are determined from empirical data.
A generic nonlinear autoregressive (AR) model for a random time series is presented. The model is obtained by a nonlinear predictive coding (NLPC) approach which expresses the minimum mean square error estimate of the...
详细信息
A generic nonlinear autoregressive (AR) model for a random time series is presented. The model is obtained by a nonlinear predictive coding (NLPC) approach which expresses the minimum mean square error estimate of the current value of the series as a Volterra series in terms of its immediate N preceding values. This Volterra series is assumed to belong to a generalized Fock Hilbert space F. In the second stage, which is parametric, the model parameters, which are coefficients of a linear combination of known nonlinear random functions of the data, are obtained by linear mean square estimation. The implementations of the model and of the estimator appear respectively as two layer recurrent and feedforward neural networks.< >
Speech feature extraction is one of the most important stage in the speech recognition process. In this paper, we propose a new neural networks architecture called the cooperative modular neural predictive coding (CMN...
详细信息
ISBN:
(纸本)0780381777
Speech feature extraction is one of the most important stage in the speech recognition process. In this paper, we propose a new neural networks architecture called the cooperative modular neural predictive coding (CMNPC). It is based on the interaction of discriminant experts DFE-NPC (discriminant feature extraction) optimized for macro-classification by the help of a criterion: the modelisation error ratio (MER). We propose a theoretical validation of this model by linking The MER with a likelihood ratio. The performances of this architecture are estimated in a phoneme recognition task. The phonemes are extracted from the Darpa-Timit speech database. Comparisons with coding methods (LPC, MFCC, PLP) are presented. They put in obviousness an improvement of the recognition rates.
We explore the performance of two dimensional (2-D) prediction based LSF quantization method for both wide-band and telephone-band (narrow-band) speech. The 2-D prediction based method exploits both the inter-frame an...
详细信息
We explore the performance of two dimensional (2-D) prediction based LSF quantization method for both wide-band and telephone-band (narrow-band) speech. The 2-D prediction based method exploits both the inter-frame and intra-frame correlations of LSF parameters. We show that a 4th order 2-D predictor provides optimum prediction gain as well as improved quantization performance at various choices of frame shift for both wide-band and telephone-band speech. Existing one dimensional (1-D) predictive method, exploiting only inter-frame correlation, results in poor performance at larger frame shifts; whereas proposed 2-D predictor provides lower spectral distortion as well as lower number of outliers compared to existing memory-based and memory-less methods.
Precise and efficient speech recognition methods are crucial for interactive human-machine communication, particularly on embedded devices constrained by limited computational and storage resources. Deployment of reso...
详细信息
ISBN:
(数字)9798350388855
ISBN:
(纸本)9798350388862
Precise and efficient speech recognition methods are crucial for interactive human-machine communication, particularly on embedded devices constrained by limited computational and storage resources. Deployment of resource-intensive speech recognition systems on such devices becomes challenging. To address these issues, we have improved the Contrastive predictive coding (CPC) algorithms by incorporating a self-attention mechanism, resulting in what we call Self-Attention Contrastive predictive coding (SACPC). The integration of the self-attention mechanism not only delivers outstanding performance but also reduces the model's parameter count, thus mitigating deployment difficulties on embedded devices. Leveraging pretraining on the open-source dataset LibriSpeech-100h, then supplementing the backend model, we conducted tests on a proprietary dataset for a 20-class speech command recognition task, validating the model's improved accuracy.
暂无评论