The method of transform coding for image-data compression is generalized by regarding transform coding as a least-squares approximation of two-dimensional functions. By an orthogonalization of basis functions with res...
详细信息
The method of transform coding for image-data compression is generalized by regarding transform coding as a least-squares approximation of two-dimensional functions. By an orthogonalization of basis functions with respect to a particular segment shape, a generalized transform coding scheme is derived. The algorithm contains all block-oriented transforms as special cases and allows the construction of new transforms, e.g. polynomial or spline transforms. The theoretical results are converted into coder and decoder structures enabling region-oriented transform coding without a transmission of the orthogonal basis functions for each segment.< >
It has been shown [1] that an analysis/synthesis system based on a sinusoidal representation leads to synthetic speech that is essentially indistinguishable from the original. By exploiting the peak-to-peak correlatio...
详细信息
It has been shown [1] that an analysis/synthesis system based on a sinusoidal representation leads to synthetic speech that is essentially indistinguishable from the original. By exploiting the peak-to-peak correlation of the sine-wave amplitudes [2], a harmonic model for the sine-wave frequencies, and a predictive model for the sine-wave phases [3], it has also been shown that the sine-wave parameters can be coded at 8 kbps. In this paper a new technique is described for coding the sine-wave amplitudes based on the idea of a pitch-adaptive channel vocoder. Using this amplitude-coding strategy and operating at a total bit rate of 4.8 kbps, it was possible to code and transmit enough phase information so that very intelligible, natural sounding speech could be synthesized. This 4.8 kbps system has been implemented in real-time and has achieved a Diagnostic Rhyme Test (DRT) score of 95. At 2.4 kbps no explicit phase information could be coded, but by phase-locking all of the sine waves to the fundamental, by adding a pitch-adaptive quadratic phase, and by adding a voicing dependent random phase to each sine wave, natural sounding synthetic speech could be obtained. This new system is currently being implemented in real-time so that intelligibility tests can be performed.
In low-bitrate audio coding, modern coders often rely on efficient parametric techniques to enhance the performance of the waveform preserving transform coder core. While the latter features well-known perceptually ad...
详细信息
In low-bitrate audio coding, modern coders often rely on efficient parametric techniques to enhance the performance of the waveform preserving transform coder core. While the latter features well-known perceptually adapted quantization of spectral coefficients, parametric techniques reconstruct the signal parts that have been quantized to zero by the encoder to meet the low-bitrate constraint. Large numbers of zeroed spectral values and especially consecutive zeros constituting gaps often lead to audible artifacts at the decoder. To avoid such artifacts the new 3GPP Enhanced Voice Services (EVS) coding standard utilizes noise filling and intelligent gap filling (IGF) techniques, guided by spectral envelope information. In this paper the underlying considerations of the parametric energy adjustment and transmission in EVS and its relation to noise filling, IGF, and tonality preservation are presented. It is further shown that complex-valued IGF envelope calculation in the encoder improves the temporal energy stability of some signals while retaining real-valued decoder-side processing.
A technique for sine-wave synthesis is described that uses the fast Fourier transform overlap-add method at a 100 Hz rate based on sine-wave parameter coded at a 50 Hz rate. This technique leads to an implementation r...
详细信息
A technique for sine-wave synthesis is described that uses the fast Fourier transform overlap-add method at a 100 Hz rate based on sine-wave parameter coded at a 50 Hz rate. This technique leads to an implementation requiring less than one-half the computational power of a digital-signal-processor chip. The synthesis method implicitly introduces a frequency jitter which renders the encoded synthetic speech more natural. For speech computed by additive acoustic noise, the synthesizer, in conjunction with straightforward noise suppression, greatly improve the quality of the synthetic speech, rendering the sinusoidal transform coder (STC) algorithm a truly robust system. More recent architecture studies of the STC algorithm suggests that an entire implementation requires no more than two ADSP2 100 chips.< >
A progressive image transmission scheme is presented which combines block transforms, a quadtree data structure and vector quantization. The experimental results demonstrate that the scheme achieves lossless progressi...
详细信息
A progressive image transmission scheme is presented which combines block transforms, a quadtree data structure and vector quantization. The experimental results demonstrate that the scheme achieves lossless progressive transmission with compression. The quadtree data structure makes it possible to transmit the images progressively. Compression is obtained by using vector quantization on each level. Lossless reproduction is guaranteed by delivering the residual errors due to quantization from high level to low level and using an entropy coder on the final residual error image.< >
Summary form only given. We designed a family of integer-to-integer (i2i) approximations to the Cartesian-to-polar transformation and analyzed its behavior for high-rate transform coding. Denoting (ordinary, continuou...
详细信息
Summary form only given. We designed a family of integer-to-integer (i2i) approximations to the Cartesian-to-polar transformation and analyzed its behavior for high-rate transform coding. Denoting (ordinary, continuous) polar coordinates by (r, 0), our precise high-rate analysis relates the performance to the differential entropies of r 2 and 0, which are often easy to evaluate. One may thus predict when there is an improvement over linear transform coding. The analysis matches our simulations for coding of Gaussian scale mixtures and other polar-separable sources. The advantage over the best linear transform coder can be large. Our hope is to extend the polar-coordinate results to a general theory for nonlinear transform coding based on i2i implementations of arbitrary nonlinear transformations
Deblocking is required when data rate of a video stream is low. Images taken from such video streams also contain blocking artifacts. Similarly, when the images are compressed by ICA (Independent Component Analysis) t...
详细信息
Deblocking is required when data rate of a video stream is low. Images taken from such video streams also contain blocking artifacts. Similarly, when the images are compressed by ICA (Independent Component Analysis) transform coding, the resultant decompressed images are blocky. This paper describes the outcome of applying deblocking filters on ICA decoded images by using two schemes namely, “deblocking filter for low bit rate MPEG-4 video” and “low complexity deblocking method for DCT coded video signals”, respectively. The experiment is conducted on three sets of images at different quantization parameter. According to the results, the deblocking filter of scheme1 provides better SNR ratio than that of scheme2.
A new vector quantization scheme in discrete cosine transform domain, named DCT-VQ and its application to color image coding are described. In this scheme, DCT-domain is partitioned into vectors which are normalized a...
详细信息
A new vector quantization scheme in discrete cosine transform domain, named DCT-VQ and its application to color image coding are described. In this scheme, DCT-domain is partitioned into vectors which are normalized and vector-quantized using universal vector quantizers designed with multidimensional Laplacian distribution. Adaptive coding scheme is also introduced to obtain better reconstruction of images. The color image coder employs the above scheme and encodes separately three components converted from R,G,B signals. The simulations have shown that adaptive DCT-VQ exhibits better performance than a conventional adaptive cosine transform coding with scalar quantization. The decomposition of DCT-block into vectors results in much less complex coder than a vector quantizer in original space domain.
Sinusoidal transform coding (STC) techniques model speech as the sum of sine-waves whose frequencies, amplitudes and phases are specified at regular intervals. To achieve a low-bit rate representation, only the spectr...
详细信息
Sinusoidal transform coding (STC) techniques model speech as the sum of sine-waves whose frequencies, amplitudes and phases are specified at regular intervals. To achieve a low-bit rate representation, only the spectral envelope is encoded and the phases are regenerated according to a minimum phase assumption. In this paper, the inaccuracy of the minimum phase model is demonstrated. It is shown that the phase spectra of decoded speech segments may be corrected using either the parameters of a Rosenberg pulse model or a second order all-pass filter. Experiments have shown that by applying this correction, the phase accuracy increases and the speech quality improves.
The theme work presented in this paper is adetailed analysis of various transforms like Discrete Cosinetransform, Singular Value Decomposition, Discrete Hadamardtransform, Slant transform, Discrete Haar transform whic...
详细信息
The theme work presented in this paper is adetailed analysis of various transforms like Discrete Cosinetransform, Singular Value Decomposition, Discrete Hadamardtransform, Slant transform, Discrete Haar transform whichare applied to a set of considered biomedical images to achieveimage compression. The operations on images are performedin transform domain where the DC coefficients are stored andtruncation operation is performed by setting correspondingthreshold to achieve desired PSNR to maintain the quality ofreconstruction. In this paper, the biomedical images aresubjected to all compression schemes mentioned by settingPSNR to 25dB and 30dB. The reconstruction qualities for boththe results (25dB and 30dB) are tabulated and detailed analysisis done based on the quality of reconstruction which throwslight on optimal transform. The metrics used for analysis areMean Square Error, Peak Signal to Noise ratio, StructuralSimilarity Index, Compression ratio, Energy Compaction and Auto-Correlation.
暂无评论