Conventional audio coding technologies commonly leverage human perception of sound, or psychoacoustics, to reduce the bitrate while preserving the perceptual quality of the decoded audio signals. For neural audio code...
详细信息
Conventional audio coding technologies commonly leverage human perception of sound, or psychoacoustics, to reduce the bitrate while preserving the perceptual quality of the decoded audio signals. For neural audio codecs, however, the objective nature of the loss function usually leads to suboptimal sound quality as well as high run-time complexity due to the large model size. In this work, we present a psychoacoustic calibration scheme to re-define the loss functions of neural audio coding systems so that it can decode signals more perceptually similar to the reference, yet with a much lower model complexity. The proposed loss function incorporates the global masking threshold, allowing the reconstruction error that corresponds to inaudible artifacts. Experimental results show that the proposed model outperforms the baseline neural codec twice as large and consuming 23.4% more bits per second. With the proposed method, a lightweight neural codec, with only 0.9 million parameters, performs near-transparent audio coding comparable with the commercial MPEG-1 audio Layer III codec at 112 kbps.
A process of spectral recovery can enhance the performance of transform-based audio coding by transmitting only a portion of spectral data and recovering the missing spectral data in the decoder. This study proposes a...
详细信息
ISBN:
(纸本)9781509066315
A process of spectral recovery can enhance the performance of transform-based audio coding by transmitting only a portion of spectral data and recovering the missing spectral data in the decoder. This study proposes an enhanced method of audio coding based on spectral recovery with an adaptive structure that yields improved sound quality compared with the previous method. The spectral data to be recovered are arranged in an adaptive pattern depending on the difficulty of recovery. In addition, according to the spectral characteristics, prior information associated with these spectral data is selectively transmitted that helps a neural network improve the performance of magnitude recovery. Prior information also provides the signs of recovered magnitudes. A subjective performance evaluation shows that, for mono coding without window switching at 40 kbps, the proposed coding method provides better sound quality than the conventional method on average.
In this paper an audio coding scheme based on the empirical mode decomposition in association with a psychoacoustic model is presented. The principle of the method consists in breaking down adaptively the audio signal...
详细信息
In this paper an audio coding scheme based on the empirical mode decomposition in association with a psychoacoustic model is presented. The principle of the method consists in breaking down adaptively the audio signal into intrinsic oscillatory components, called Intrinsic Mode Functions (IMFs), that are fully described by their local extrema. These extrema are encoded. The coding is carried out frame by frame and no assumption is made upon the signal to be coded. The number of allocated bits varies from mode to mode and obeys to the coding error inaudibility constraint. Due to the symmetry of an IMF, only the extrema (maxima or minima) of one of its interpolating envelopes are perceptually coded. In addition, to deal with rapidly changing audio signals, a stationarity index is used and when a transient is detected, the frame is split into two overlapping sub-frames. At the decoder side, the IMFs are recovered using the associated coded maxima, and the original signal is reconstructed by IMFs summation. Performance of the proposed coding is analyzed and compared to that of MP3 and AAC codecs, and the wavelet-based coding approach. Based on the analyzed mono audio signals, the obtained results show that the proposed coding scheme outperforms the MP3 and the wavelet-based coding methods and performs slightly better than the AAC codec, showing thus the potential of the EMD for data-driven audio coding. (C) 2020 Elsevier Inc. All rights reserved.
It was recently shown that the combination of source prediction, two-times oversampling, and noise shaping, can be used to obtain a robust (multiple-description) audio coding framework for networks with packet loss pr...
详细信息
It was recently shown that the combination of source prediction, two-times oversampling, and noise shaping, can be used to obtain a robust (multiple-description) audio coding framework for networks with packet loss probabilities less than 10%. Specifically, it was shown that audio signals could be encoded into two descriptions (packets), which were separately sent over a communication channel. Each description yields a desired performance by itself, and when they are combined, the performance is improved. This paper extends the previous work to an arbitrary number of descriptions (packets) by using fractional oversampling and a new decoding principle. We demonstrate that, due to source aliasing, existing MSE optimized reconstruction rules from noisy sampled data, performs poorly from a perceptual point of view. A simple reconstruction rule is proposed, that improves the PEAQ objective difference grades (ODG) by more than 2 points. The proposed audio coder enables low-delay high-quality audio streaming on networks with late packet arrivals or packet losses. With a coding delay of 2.5 ms, and a total bitrate of 300 kbps, it is demonstrated that mean PEAQ ODGs around -0.65 can be obtained for 48 kHz (mono) music (pop & rock), and packet loss probabilities of 20%.
This study proposes a new method of audio coding based on spectral recovery, which can enhance the performance of transform audio coding. An encoder represents spectral information of an input in a time-frequency doma...
详细信息
ISBN:
(纸本)9781479981311
This study proposes a new method of audio coding based on spectral recovery, which can enhance the performance of transform audio coding. An encoder represents spectral information of an input in a time-frequency domain and transmits only a portion of it so that the remaining spectral information can be recovered based on the transmitted information. A decoder recovers the magnitudes of missing spectral information using a convolutional neural network. The signs of missing spectral information are either transmitted or randomly assigned, according to their importance. By combining transmission and recovery of spectral information, the proposed method can enhance the coding performance, compared with conventional transform coding. The subjective performance evaluation shows that, for mono coding at 39.4 kbps, the proposed method provides higher sound quality than the USAC, by an average MUSHRA score of 8.5.
Virtual Reality (VR) audio scenes may be composed of a very large number of audio elements, including dynamic audio objects, fixed audio channels and scene-based audio elements such as Higher Order Ambisonics (HOA). P...
详细信息
ISBN:
(纸本)9781479981311
Virtual Reality (VR) audio scenes may be composed of a very large number of audio elements, including dynamic audio objects, fixed audio channels and scene-based audio elements such as Higher Order Ambisonics (HOA). Potentially, the subjective listening experience may be replicated using a compact spatial format with a set number of dynamic objects and scene-based elements, retaining only the perceptual essence of the audio scene. The compact format would further enable a reduction in the complexity of subsequent compression and rendering. This paper investigates these hypotheses by exploring the use of a compact format that consists of up to four dynamic objects and nine HOA channels, with the Enhanced Voice Services (EVS) codec being applied to a 4-channel down-mix of the compact format.
DRA (Digital Rise audio) is referred to as specification for multi-channel digital audio coding technology which was issued as a Chinese national standard in 2008. There novel algorithms which are the differential Huf...
详细信息
ISBN:
(数字)9781728172026
ISBN:
(纸本)9781728172033
DRA (Digital Rise audio) is referred to as specification for multi-channel digital audio coding technology which was issued as a Chinese national standard in 2008. There novel algorithms which are the differential Huffman, context-based Huffman and context-based arithmetic entropy coding, respectively, are presented to improve the compression efficiency of DRA entropy coding with 3%, 11.11% and 13.52% at the expense of the different implementing complexities and the intra-frame and/or inter-frame error propagation in live transmission.
Ultra High Definition Television (UHDTV) has been a promising system for future TV broadcasting where 22.2 multi-channel audio is adopted for creating three-dimensional (3D) audio in the home-user listening area. In t...
详细信息
ISBN:
(纸本)9781538669907;9781538669891
Ultra High Definition Television (UHDTV) has been a promising system for future TV broadcasting where 22.2 multi-channel audio is adopted for creating three-dimensional (3D) audio in the home-user listening area. In the development stage, Moving Picture Expert Group (MPEG) Advanced audio coding (AAC) standard has been chosen for audio encoding. However, this choice is questionable as its performance deteriorates as the bit-rate decreases. In this paper, Spatial audio coding (SAC) with optimization technique is proposed due to its strength on working at lower bit-rates and its backward compatibility to fewer multi-channel configurations. The results of experiments show that the proposed method achieves Objective Difference Grade (ODG) score more than -1, which is rated as excellent score, at bit-rates starting from 1200 kb/s.
A novel approach to PWM coding is introduced based on generating two complementary PWM streams out of the in-band signal's spectrum. The 2 streams are then recombined in a suitable way, so that out-of-phase cancel...
详细信息
ISBN:
(纸本)9780992862671
A novel approach to PWM coding is introduced based on generating two complementary PWM streams out of the in-band signal's spectrum. The 2 streams are then recombined in a suitable way, so that out-of-phase cancellation of the carrier frequency harmonics is achieved. The approach suppresses the strong out-of-band frequencies of the carrier signal without introducing distortion of the in-band coded signal. Such method can achieve superior reduction compared to the out-of-band artefact suppression induced by traditional analog low-pass filters employed in typical Class-D audio amplifiers or other switching power delivery systems, hence allowing designs with reduced filter requirements or even filterless implementations.
Perceptual audio coding schemes typically apply the modified discrete cosine transform (MDCT) with different lengths and windows, and utilize signal-adaptive switching between these on a perframe basis for best subjec...
详细信息
Perceptual audio coding schemes typically apply the modified discrete cosine transform (MDCT) with different lengths and windows, and utilize signal-adaptive switching between these on a perframe basis for best subjective performance. In previous papers, the authors demonstrated that further quality gains can be achieved for some input signals using additional transform kernels such as the modified discrete sine transform (MDST) or greater inter-transform overlap by means of a modified extended lapped transform (MELT). This work discusses the algorithmic procedures and codec modifications necessary to combine all of the above features-transform length, window shape, transform kernel, and overlap ratio switching-into a flexible input-adaptive coding system. It is shown that, due to full time-domain aliasing cancelation, this system supports perfect signal reconstruction in the absence of quantization and, thanks to fast realizations of all transforms, increases the codec complexity only negligibly. The results of a 5.1 multichannel listening test are also reported.
暂无评论