Spatially Squeezed Surround audio coding (S-3 AC) has been previously shown to provide efficient coding with perceptually accurate soundfield reconstruction when applied to ITU 5.1 multichannel audio. This paper inves...
详细信息
ISBN:
(纸本)9781424414833
Spatially Squeezed Surround audio coding (S-3 AC) has been previously shown to provide efficient coding with perceptually accurate soundfield reconstruction when applied to ITU 5.1 multichannel audio. This paper investigates the application of S(3)AC to the coding of Ambisonic audio recordings. Traditional Ambisonics achieve compression and backward compatibility through the use of the UTU matrixing approach to obtain a stereo signal. In this paper the relationship to Ambisonic B-format signals is described and alternative approaches that derive a stereo or mono-downmix signal based on SAC are presented and evaluated. The mono-downmix approach utilizes side information consisting of spatial cues that are quantized based on novel source localization listening experiments. Objective and subjective tests demonstrate significant improvements in the localization of sound sources resulting from decoding the compressed 13-format signals to a 5.1 speaker playback.
audio processing applications such as rate determination, bandwidth extension, compression, and noise reduction make use of loudness metries. Most loudness estimation algorithms are computationally expensive and often...
详细信息
ISBN:
(纸本)9781424414833
audio processing applications such as rate determination, bandwidth extension, compression, and noise reduction make use of loudness metries. Most loudness estimation algorithms are computationally expensive and often not suitable for real time applications. In this paper, we present a low-complexity loudness estimation algorithm applicable to both steady and time-varying sounds. The model computes an estimate of the excitation pattern by simultaneously pruning the frequency components and detector locations. Comparative results indicate that the proposed algorithm performs consistently well for different types of audio signals at a reduced complexity.
We present the ***-VBR winning candidate codec recently selected by Question 9 of Study Group 16 (Q9/16) of ITU-T as a baseline for the development of a scalable solution for wideband speech and audio compression at r...
详细信息
ISBN:
(纸本)9781424414833
We present the ***-VBR winning candidate codec recently selected by Question 9 of Study Group 16 (Q9/16) of ITU-T as a baseline for the development of a scalable solution for wideband speech and audio compression at rates between 8 kb/s and 32 kb/s. The Q9/16 codec is an embedded codec comprising 5 layers where higher layer bitstreams can be discarded without affecting the decoding of the lower layers. The two lower layers are based on the CELP technology where the core layer takes advantage of signal classification based encoding. The higher layers encode the weighted error signal from lower layers using overlap-add transform coding. The codec has been designed with the primary objective of a high-performance wideband speech coding for error-prone telecommunications channels, without compromising the quality for narrowband/wideband speech or wideband music signals. The codec performance is demonstrated with selected test results.
This paper describes a new search algorithm that quickly finds interchannel relationships between a coding channel and a reference channel in the multichannel coding tool of the MPEG-4 audio Lossless coding (ALS) inte...
详细信息
ISBN:
(纸本)9781424414833
This paper describes a new search algorithm that quickly finds interchannel relationships between a coding channel and a reference channel in the multichannel coding tool of the MPEG-4 audio Lossless coding (ALS) international standard. The algorithm has tree structure and can reduce data size with significantly smaller computation load than that of the conventional one. The devised method is based on a restricted greedy algorithm. It chooses the most efficient branch which does not make any loops in the existing path. The results of comprehensive evaluations show that this method maintains the compression performance (compression to around 1/3) and performs 1000 times as fast as the conventional method for the 512-channel magnetoencephalography signals. This algorithm enables practical lossless compression of biomedical data by the ALS, and at the same time, opens the way to a new multichannel analysis tool that may be used for purposes other than compression. The continual maintenance of this standard will make it possible to perfectly reconstruct encoded files even 100 years from now.
audio coding based on Frequency Domain Linear Prediction (FDLP) uses auto-regressive model to approximate Hilbert envelopes in frequency sub-bands for relatively long temporal segments. Although the basic technique ac...
详细信息
ISBN:
(纸本)9781424414833
audio coding based on Frequency Domain Linear Prediction (FDLP) uses auto-regressive model to approximate Hilbert envelopes in frequency sub-bands for relatively long temporal segments. Although the basic technique achieves good quality of the reconstructed signal, there is a need for improving the coding efficiency. In this paper, we present a novel method for the application of temporal masking to reduce the bit-rate in a FDLP based codec. Temporal masking refers to the hearing phenomenon, where the exposure to a sound reduces response to following sounds for a certain period of time (up to 200 ins). In the proposed version of the codec, a first order forward masking model of the human ear is implemented and informal listening experiments using additive white noise are performed to obtain the exact noise masking thresholds. Subsequently, this masking model is employed in encoding the sub-band FDLP carrier signal. Application of the temporal masking in the FDLP codec results in a bit-rate reduction of about 10% without degrading the quality. Performance evaluation is done with Perceptual Evaluation of audio Quality (PEAQ) scores and with subjective listening tests.
A new scheme for sinusoidal audio coding named multiple description spherical trellis-coded quantization is proposed and analytic expressions for the point densities and expected distortion of the quantizers are deriv...
详细信息
ISBN:
(纸本)9781424414833
A new scheme for sinusoidal audio coding named multiple description spherical trellis-coded quantization is proposed and analytic expressions for the point densities and expected distortion of the quantizers are derived based on a high-resolution assumption. The proposed quantizers are of variable dimension, i.e., sinusoids can be quantized jointly for each audio segment whereby a lower distortion is achieved. The quantizers are designed to minimize a perceptual distortion measure subject to an entropy constraint for a given packet-loss probability. In experiments, the performance of the quantizers is compared to the corresponding single description spherical quantizer and associated bounds are found to increase robustness towards packet-losses.
The fact that audio compression for streaming or storage is usually performed offline alleviates traditional constraints on encoding delay. We propose a rate-distortion optimized approach, within the MPEG Advanced Aud...
详细信息
ISBN:
(纸本)9781424414833
The fact that audio compression for streaming or storage is usually performed offline alleviates traditional constraints on encoding delay. We propose a rate-distortion optimized approach, within the MPEG Advanced audio coding framework, to trade delay for optimal window switching, resource allocation and selection of quantization and coding parameters for the entire audio file using a two-layered trellis. Stages of the outer trellis correspond to audio frames, nodes represent window choices, and branches implement transition constraints. The inner trellis operates within each node of the outer layer and has stages corresponding to scalefactor bands and nodes representing combinations of quantization and coding parameters. A suitable cost, comprising bit consumption and psychoacoustic distortion, is optimized via multiple passes through the two-layered trellis to achieve the desired bitrate. The procedure thus optimizes most of the encoding decisions involved in audio compression. Objective and subjective tests show considerable performance gains.
In this paper we introduce a new watermarking model combining a joint time frequency (TF) representation using the molecular matching pursuit (NEAP) algorithm and a psychoacoustic model. We take advantage of the notio...
详细信息
ISBN:
(纸本)9781424414833
In this paper we introduce a new watermarking model combining a joint time frequency (TF) representation using the molecular matching pursuit (NEAP) algorithm and a psychoacoustic model. We take advantage of the notion of structure of the signal introduced by the MMP to get a precise representation of audio signals, and then by using a psychoacoustic model we can embed a watermark efficiently on the signal. By selecting atoms of TF components that are not perceptible by the human ear we ensure the security and imperceptibility of the watermark. Then by judicious selection of the watermark host spots we ensure the robustness of the watermark to main kind of signal attacks, including lossy compression. The robustness of the proposed method proves the potential of joint TF representation techniques as viable watermarking schemes.
A dynamic subbands design selection scheme for perceptual audio coding is presented which optimizes the coding efficiency in association with an improved utilization of the masking potential of the signal. The choices...
详细信息
A dynamic subbands design selection scheme for perceptual audio coding is presented which optimizes the coding efficiency in association with an improved utilization of the masking potential of the signal. The choices for subbands design selection include critical bands, uniform subbands, uniform subbands with a non-uniform last band and a non-uniform-non-critical (NUNC) subbands distribution. A particular consideration is given to corresponding critical band boundaries while formulating NUNC subbands, to minimize the perception of coding noise. The design selection decision is made by first identifying the best tradeoff between empty quantizer slots and amount of side information, followed by a minimization of the perceptual entropy (PE) estimates. A further analysis is carried out to identify the prominent spectral peaks (PSPs), with a view to efficiently utilize their masking potential by the bit allocation algorithm which makes a bit cut first to subbands with PSPs. Test results based on ITU-R recommendation BS.1116 show that our coding scheme performs slightly better as compared to MPEG-4 AAC verification model (VM) for a majority of signal types from the SQAM database. The paper concludes with a discussion of future research implications of the work
暂无评论