This paper proposes an encoding method for high-quality, low-delay audio communication that is robust to losses in packetized transmission. Robustness is provided by a multiple description vector quantization (MDVQ) t...
详细信息
This paper proposes an encoding method for high-quality, low-delay audio communication that is robust to losses in packetized transmission. Robustness is provided by a multiple description vector quantization (MDVQ) technique that is designed to minimize the mean-squared error (MSE). The key to applying this technique effectively is the use of psycho-acoustically controlled pre- and post-filters that make the mean-squared quantization error perceptually relevant. Experiments show that the MDVQ-based encoder yields better results-in both MSE and subjective audio quality-than simple alternative coders with the same low delay.
This article explores the integration of model-based and data-driven approaches within the realm of neural speech and audio coding systems. It highlights the challenges posed by the subjective evaluation processes of ...
详细信息
This article explores the integration of model-based and data-driven approaches within the realm of neural speech and audio coding systems. It highlights the challenges posed by the subjective evaluation processes of speech and audio codecs and discusses the limitations of purely data-driven approaches, which often require inefficiently large architectures to match the performance of model-based methods. The study presents hybrid systems as a viable solution, offering significant improvements to the performance of conventional codecs through meticulously chosen design enhancements. Specifically, it introduces a neural network-based signal enhancer that is designed to postprocess existing codecs' output, along with the autoencoder-based end-to-end models and LPCNet-hybrid systems that combine linear predictive coding (LPC) with neural networks. Furthermore, the article delves into predictive models that operate within custom feature spaces (TF-Codec) or predefined transform domains (MDCTNet) and examines the use of psychoacoustically calibrated loss functions to train end-to-end neural audio codecs. Through these investigations, the article demonstrates the potential of hybrid systems to advance the field of speech and audio coding by bridging the gap between traditional model-based approaches and modern data-driven techniques.
There is a considerable performance gap between the current scalable audio coding schemes and a nonscalable coder operating at the same bitrate. This suboptimality results from the independent coding of the layers in ...
详细信息
There is a considerable performance gap between the current scalable audio coding schemes and a nonscalable coder operating at the same bitrate. This suboptimality results from the independent coding of the layers in these systems. One of the aspects that plays a role in this suboptimality is the entropy coding. In practical audio coding systems including MPEG advanced audio coding (AAC), the transform domain coefficients are quantized using an entropy-constrained quantizer. InMPEG-4 scalable AAC (S-AAC), the quantization and coding are performed separately at each layer. In case of Huffman coding, the redundancy introduced by the entropy coding at each layer is larger at lower quantization resolutions. Also, the redundancy for the overall coder becomes larger as the number of layers increases. In fact, there is a trade-off between the overall redundancy and the fine-grain scalability in which the bitrate per layer is smaller and more layers are required. In this paper, a fine-grain scalable coder for audio signals is proposed where the entropy coding of a quantizer is made scalable via joint design of entropy coding and quantization. By constructing a Huffman-like coding tree where the internal nodes can be mapped to the reconstruction points, the tree can be pruned at any internal node to control the rate-distortion (RD) performance of the encoder in a fine-grain manner. A set of metrics and a trellis-based approach is proposed to create a coding tree so that an appropriate path is generated on the RD plane. The results show the proposed method outperforms the scalable audio coding performed based on reconstruction error quantization as used in practical systems, e.g., in S-AAC.
This paper considers the problem of selecting a set of parameter values from a given parameter space, in order to perform rate-distortion optimization in the context of audio compression. Due to interdependencies betw...
详细信息
This paper considers the problem of selecting a set of parameter values from a given parameter space, in order to perform rate-distortion optimization in the context of audio compression. Due to interdependencies between parameters, separate optimization of parameter values is inherently suboptimal, yet a straightforward brute-force joint search involves prohibitive computational complexity. This work proposes a new method for joint rate-distortion optimization, while accounting for interparameter dependencies. The optimal solution is achieved, at significantly reduced complexity as compared to a brute-force search, by employing a Viterbi search over a trellis. Two objective distortion metrics are specifically considered: the average, and the maximum noise-to-mask ratio. Subjective (AB/MOS) and objective (average/maximum noise-to-mask ratio) tests demonstrate considerable gains at low bit rates of 16 kbps per channel for a 44.1-kHz sampled audio signal using the proposed approach.
Hybrid In Band on Channel (IBOC) digital audio broadcasting simultaneously with analog amplitude modulation (AM) has been proposed as a hybrid solution to digital audio broadcasting in the AM band. Since the AM band i...
详细信息
Hybrid In Band on Channel (IBOC) digital audio broadcasting simultaneously with analog amplitude modulation (AM) has been proposed as a hybrid solution to digital audio broadcasting in the AM band. Since the AM band is crowded and since the available bandwidth per program is limited, adding digital transmission is a challenging proposition. To achieve FM like audio quality, an audio coder rate of 32-64 kb/sec may be required. One of the currently proposed hybrid IBOC-AM systems is 30 kHz wide. Severe second adjacent interference may occur in certain geographical. areas. This may lead to loss of 40% of the effective transmission audio bit rate. For coping with such harsh transmission conditions, we present a solution based on embedded/multidescriptive audio coding with matched multistream transmission in separate frequency bands. With loss of one frequency band, the embedded system blends to a lower audio coder rate with a much better quality than analog AM. The nonembedded system without multistream transmission fails catastrophically when a little more than one sideband is severely interfered with causing a severe discontinuity in quality while blending directly to analog AM. A number of detailed robust embedded systems are outlined. We also show how multistream transmission schemes can be used with nonembedded audio coders. Both daytime and nighttime scenarios are included. This paper contains a catalog of possible systems for different audio quality levels and interference scenarios, including systems with 20 kHz bandwidth rather than 30 kHz.
The recently approved CCITT Recommendation G.722 on 7-kHz audio coding within 64 kb/s is described. A review of the historical background and some basic requirements and choices for such an algorithm is followed by a ...
详细信息
The recently approved CCITT Recommendation G.722 on 7-kHz audio coding within 64 kb/s is described. A review of the historical background and some basic requirements and choices for such an algorithm is followed by a high-level description of the recommended algorithm and a brief discussion of the detailed arithmetical choices that have been made. Some performance results are provided for speech and music signals. Finally, the main applications foreseen are reviewed as well as consequent system aspects. The anticipated applications and resulting system considerations are discussed.< >
In this letter, we present a decomposition for sinusoidal coding of audio, based on an amplitude modulation of sinusoids via a linear combination of arbitrary basis vectors. The proposed method, which incorporates a p...
详细信息
In this letter, we present a decomposition for sinusoidal coding of audio, based on an amplitude modulation of sinusoids via a linear combination of arbitrary basis vectors. The proposed method, which incorporates a perceptual distortion measure, is based on a relaxation of a nonlinear least-squares minimization. Rate-distortion curves and listening tests show that, compared to a constant-amplitude sinusoidal coder, the proposed decomposition offers perceptually significant improvements in critical transient signals.
The modified discrete cosine transform (MDCT) is employed in subband/transform coding schemes as the analysis/synthesis filter bank based on time domain aliasing cancellation (TDAC). The most efficient implementation ...
详细信息
The modified discrete cosine transform (MDCT) is employed in subband/transform coding schemes as the analysis/synthesis filter bank based on time domain aliasing cancellation (TDAC). The most efficient implementation of the forward and inverse MDCT computation for layer III in MPEG-1 and MPEC-2 international audio coding standards is proposed. It is based on a new fast algorithm for the forward and inverse MDCT computation in the oddly stacked system. The complete signal flow graphs for the implementation of MDCT and inverse MDCT in layer III are also provided.
This letter proposes a new masking threshold adjustment method to improve the quality for the speech signals in low bit-rate audio coding. The Enhanced aacPlus (EAAC) audio codec increases the masking threshold of all...
详细信息
This letter proposes a new masking threshold adjustment method to improve the quality for the speech signals in low bit-rate audio coding. The Enhanced aacPlus (EAAC) audio codec increases the masking threshold of all frequency bands to be suitable for the given encoding rate by considering equal loudness noises only, which is a representative way for implementing the adjustment technique. The proposed method, however, dynamically adjusts the masking threshold of each frequency band based on the energy ratio of each band to the average band energy. More quantization noises are added to formant regions that have relatively large energy ratio values, but less distortion is allowed in spectral valley regions, which eventually helps to enhance perceptual quality for speech signals. The proposed idea reflects the spectral weighting criterion in searching optimal excitation codebooks used in many speech coding algorithms. Simulation results confirm that the proposed method implemented on the EAAC coder improves quality for the speech input signals at the same bit-rate while keeping equivalent quality for music contents.
The modulated lapped transform (MLT) is used in both audio and video data compression schemes, This paper describes its properties and how it can be used to generate a time-varying filterbank, Examples of its implemen...
详细信息
The modulated lapped transform (MLT) is used in both audio and video data compression schemes, This paper describes its properties and how it can be used to generate a time-varying filterbank, Examples of its implementation in two audio coding standards are presented.
暂无评论