A new scheme for sinusoidal audio coding named multiple description spherical trellis-coded quantization is proposed and analytic expressions for the point densities and expected distortion of the quantizers are deriv...
详细信息
A new scheme for sinusoidal audio coding named multiple description spherical trellis-coded quantization is proposed and analytic expressions for the point densities and expected distortion of the quantizers are derived based on a high-resolution assumption. The proposed quantizers are of variable dimensions meaning that any number of sinusoids can be quantized jointly for each audio segment whereby a lower distortion is achieved compared to previously published scalar spherical quantizers. The quantizers;are designed to minimize a perceptual distortion measure subject to an entropy constraint for a given packet-loss probability. In experiments, the performance of the quantizers is assessed and compared to the corresponding single description spherical quantizer and associated bounds under various conditions and is found to increase robustness towards packet-loss.
In this work, we develop a new method for jointly optimal quantization of sinusoidal frequencies, amplitudes, and phases and apply the method to sinusoidal audio coding. This is an extension of an earlier work on quan...
详细信息
In this work, we develop a new method for jointly optimal quantization of sinusoidal frequencies, amplitudes, and phases and apply the method to sinusoidal audio coding. This is an extension of an earlier work on quantization of sinusoidal amplitudes and phases to frequencies. The optimization is performed for a set of sinusoids that models a short segment of an audio signal. For a given bit-rate constraint, the optimal quantizers minimize a single-letter weighted distortion measure that accounts for perceptual importance of sinusoids. The quantizers are derived analytically using high-rate theory. The method yields high performance and has a number of practical advantages over conventional sinusoidal quantization methods.
sinusoidal coding is an often employed technique in low bit-rate audio coding. Therefore, methods for efficient quantization of sinusoidal parameters are of great importance. In this paper, we use high-resolution assu...
详细信息
sinusoidal coding is an often employed technique in low bit-rate audio coding. Therefore, methods for efficient quantization of sinusoidal parameters are of great importance. In this paper, we use high-resolution assumptions to derive analytical expressions for the optimal entropy-constrained unrestricted spherical quantizers for the amplitude, phase, and frequency parameters of the sinusoidal model. This is done both for the case of a single sinusoid, and for the more practically relevant case of multiple sinusoids distributed across multiple segments. To account for psychoacoustical effects of the auditory system, a perceptual distortion measure is used. The optimal quantizers minimize a high-resolution approximation of the expected perceptual distortion, while the corresponding quantization indices satisfy an entropy constraint. The quantizers turn out to be flexible and of low complexity, in the sense that they can be determined easily for varying bit rate requirements, without any sort of retraining or iterative procedures. In an objective comparison it is shown that for the squared error distortion measure, the rate distortion performance of the proposed method is very close to that of the theoretically optimal entropy-constrained vector quantization. Furthermore, for the perceptual distortion measure, the proposed scheme is shown to objectively outperform an existing sinusoidal quantization scheme, where frequency quantization is done independently. Finally, a subjective listening test, in which the proposed scheme is compared to an existing state-of the-art sinusoidal quantization scheme with fixed quantizers for all input signals, indicates that the proposed scheme leads to an average bit rate reduction of 20%, at the same subjective quality level as the existing scheme.
In this work, we develop a new method for quantization in multistage audio coding. Given a (perceptual) distortion measure and a bit-rate constraint, we analytically derive the optimal rate distribution between subcod...
详细信息
In this work, we develop a new method for quantization in multistage audio coding. Given a (perceptual) distortion measure and a bit-rate constraint, we analytically derive the optimal rate distribution between subcoders (stages) and the corresponding optimal quantizers using high-rate theory. The analytical solutions for optimal quantizers allow a coder to easily adapt to changes in bit-rate requirements. As an illustration of the new method, we consider quantization in a two-stage sinusoidal/wave form coder that is a widely used combination in audio coding. We show that at low total rates most of the rate should be assigned to the sinusoidal (model-based, subspace) subcoder, while at high total rates most of the rate should be assigned to the waveform (full-space) subcoder. We compare the new method to a reference quantization method that does not use rate-distortion optimization. A significantly higher performance of the new method is shown by means of a listening test.
In, this work, we present a new method for quantization of sinusoidal amplitudes and phases, and apply the method to sinusoldal coding of speech and audio signals. The method is based on unrestricted polar quantizatio...
详细信息
In, this work, we present a new method for quantization of sinusoidal amplitudes and phases, and apply the method to sinusoldal coding of speech and audio signals. The method is based on unrestricted polar quantization, where phase quantization accuracy depends on amplitude. Amplitude and phase quantizers are derived under an entropy (average rate) constraint using high-rate assumptions. First, we derive optimal quantizers for one sinusoid and a mean-squared error distortion measure. We provide a detailed analysis of entropy-constrained unrestricted polar quantization, showing its high performance and practicality even at low rates. Second, we find optimal quantizers for a set of sinusoids that model a short segment of an audio signal. The optimization is performed using a, weighted error measure that can account for the masking effect in the human auditory system. We find the optimal rate distribution between, sinusoids, as well as the corresponding optimal amplitude and phase quantizers, based on the perceptual importance of sinusolds defined by masking. The new method is used in an audio-coding application and is shown to significantly outperform a conventional sinsoidal quantization method where phase quantization accuracy is identical for all sinusoids.
This article deals with low bitrate object coding of Musical audio, and more precisely with the extraction of pitched sound objects in polyphonic music. After a brief review of existing methods, we discuss the potenti...
详细信息
ISBN:
(纸本)0780391543
This article deals with low bitrate object coding of Musical audio, and more precisely with the extraction of pitched sound objects in polyphonic music. After a brief review of existing methods, we discuss the potential benefits of recasting this problem in a Bayesian framework. We define pitched objects by a set of probabilistic priors and derive efficient algorithms to infer active objects and their parameters. Preliminary experiments suggest that the proposed method results in it better sound quality than simple sinusoidal coding while achieving a lower bitrate.
A comprehensive performance analysis of sinusoidal and code excited linear prediction (CELP) speech coding is given around 4 kbit/s, using both subjective and objective measurements. Based on the observations made, ju...
详细信息
A comprehensive performance analysis of sinusoidal and code excited linear prediction (CELP) speech coding is given around 4 kbit/s, using both subjective and objective measurements. Based on the observations made, justification for the multi-modal hybrid coding approach employing both sinusoidal and CELP coding is given, and an implementation of such a coder is described. This 4 kbit/s sinusoidal/CELP speech coder utilizes four modes to classify the input speech segment: voiced, jittery-voiced, plosive and unvoiced. For voiced segments sinusoidal coding is used whereas different CELP versions are employed for the other modes. The quality of the implemented 4 kbit/s sinusoidal/ CELP speech coder in clean speech conditions is finally verified by a listening test. In the test, the 4 kbit/s coder performed almost as well as the high-quality references used, but it still needs improvements to be classified as a high-quality 4 kbit/s speech coder. (C) 2003 Elsevier B.V. All rights reserved.
sinusoidal coding plays an important role in low bit-rate audio coding. This paper considers frequency-differential encoding of the sinusoidal model parameters as an alternative to time-differential encoding. For a gi...
详细信息
sinusoidal coding plays an important role in low bit-rate audio coding. This paper considers frequency-differential encoding of the sinusoidal model parameters as an alternative to time-differential encoding. For a given signal frame, the parameters of each sinusoidal component may be encoded either differentially relative to other components in the same frame, or directly, i.e., without differential encoding. Using basic tools from graph theory, we derive several algorithms for finding bit-rate optimal combinations of direct and differential encoding of the sinusoidal parameters. In simulation experiments with audio signals, the algorithms showed bit-rate reductions of up to 28% relative to direct encoding. Furthermore, when compared to what can be considered a traditional FD encoding scheme (as used in MPEG-4 audio), the proposed algorithms achieve bit-rate reductions of up to 6%. (C) 2003 Elsevier Science B.V. All rights reserved.
sinusoidal coding has proven to be efficient for low bit-rate audio coding. In this paper we consider schemes for frequency-differential (FD) encoding of the sinusoidal model parameters. For a given signal frame, the ...
详细信息
ISBN:
(纸本)0780374029
sinusoidal coding has proven to be efficient for low bit-rate audio coding. In this paper we consider schemes for frequency-differential (FD) encoding of the sinusoidal model parameters. For a given signal frame, the parameters of a sinusoidal component may be encoded either differentially relative to other components in the same frame, or directly, i.e., without differential encoding. Using basic tools from graph theory, two algorithms are derived for finding bit rate optimal combinations of direct and differential encoding of the sinusoidal parameters. In simulation experiments with audio signals, the algorithms showed bit-rate reductions of up to 27% relative to direct encoding. Furthermore, when compared to a commonly used FD encoding scheme, the proposed algorithms achieved bit rate reductions of up to 7%.
During the last decade, CD-quality digital audio has essentially replaced analog audio. Emerging digital audio applications for network, wireless, and multimedia computing systems face a series of constraints such as ...
详细信息
During the last decade, CD-quality digital audio has essentially replaced analog audio. Emerging digital audio applications for network, wireless, and multimedia computing systems face a series of constraints such as reduced channel bandwidth, limited storage capacity, and low rest. These new applications have created a demand for high-quality digital audio delivery at low bit rates. In response to this need, consider-able research has been devoted to the development of algorithms for perceptually transparent coding of high,fidelity (CD-quality) digital audio. As a result, many algorithms have been proposed, and several have non become international and/or commercial product standards. This paper reviews algorithms for perceptually transparent coding of CD-quality digital audio, including both research and standardization activities. This paper is organized as follows. First, psychoacoustic principles are described, with the MPEG psychoacoustic signal analysis model I discussed in some detail. Next, filter bank design issues and algorithms are addressed, with a particular emphasis placed on the modified discrete cosine transform, a perfect reconstruction cosine-modulated filter bank that has become of central importance in perceptual audio coding. Then, we review methodologies that achieve perceptually transparent coding of FM- and CD-quality audio signals, including algorithms that manipulate transform components, subband signal decompositions, sinusoidal signal components, and linens prediction parameters, as well as hybrid algorithms that make use of more than one signal model. These discussions concentrate on architectures and applications of those techniques that utilize psychoacoustic models to exploit efficiently masking characteristics of the human receiver. Several algorithms that have become international and/or commercial standards receive in-depth treatment, including the ISO/IEC MPEG family (-1, -2, -4), the Lucent Technologies PAC/EPAC/MPAC, the Dolby(
暂无评论