This paper presents comprehensive optimizations of a neural audio coder built upon a variational autoencoder (VAE) system integrated with an arithmetic coder. Our optimizations focus on two primary aspects: a novel lo...
详细信息
This paper presents comprehensive optimizations of a neural audio coder built upon a variational autoencoder (VAE) system integrated with an arithmetic coder. Our optimizations focus on two primary aspects: a novel loss function design and advanced entropy modeling of bottleneck latent embeddings. The loss function design incorporates parameters from a psychoacoustic model (PAM) into the frame-wise distortion measure, providing excellent perceptual quality. In addition, a multi-time scale discriminator is utilized to minimize distortions across adjacent frames, reducing artifacts at frame edges. Also, the coder is optimized considering three sophisticated entropy models within the latent domain: the Factorized Entropy Model (FEM), the Hyperprior Model (HPM), and the Joint Hierarchical Model (JHM). Notably, the JHM enhances context modeling across frames to effectively predict components influenced by long-term dependencies. To verify the optimization performance, we conducted extensive experiments using a dataset consisting of commercial movie clips and two additional public datasets. Objective metrics consistently demonstrated that our optimized loss function and latent modeling achieved superior performance across all test datasets compared to traditional codecs such as LAME-MP3 and FDK-AAC. Subjective assessments also indicated that our system could offer comparable or superior auditory quality to FDK-AAC.
This study investigates the implication of utilizing the psychoacoustic model (PAM) within the neural audio coder (NAC), specifically focusing on the masking of quantization noise. We introduce a novel training strate...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
This study investigates the implication of utilizing the psychoacoustic model (PAM) within the neural audio coder (NAC), specifically focusing on the masking of quantization noise. We introduce a novel training strategy to incorporate the PAM into the NAC more accurately. This method involves a discriminator that directly or indirectly measures the PAM loss. For the indirect measurement, a multi-scale STFT discriminator (MS-STFTD) is incorporated to introduce an auxiliary loss term in addition to the existing PAM loss. Conversely, for the direct measurement, we have designed a multi-scale PAM discriminator (MS-PAMD) that quantifies PAM-specific parameters. Experimental results show that adding the discriminator masks the quantization noise better than the previous NAC, and it obtains audio quality comparable to the commercial AAC in both objective and subjective scores.
This paper proposes an end-to-end neural audio coder based on a mean-scale hyperprior model together with a perceptual optimization using a psychoacoustic model (PAM)-based loss function. The proposed coder estimates ...
详细信息
暂无评论