检索结果-内蒙古大学图书馆

Optimizations of neural audio coder Toward Perceptual Transparency

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING 2024年第8期18卷 1531-1543页

作者： Byun, Joon Shin, Seungmin Hwang, Seorim Sung, Jongmo Beack, Seungkwon Park, Youngcheol Yonsei Univ Comp Sci Dept Wonju 26493 South Korea Elect & Telecommun Res Inst Daejeon 34129 South Korea Yonsei Univ Software Div Wonju 26493 South Korea

This paper presents comprehensive optimizations of a neural audio coder built upon a variational autoencoder (VAE) system integrated with an arithmetic coder. Our optimizations focus on two primary aspects: a novel loss function design and advanced entropy modeling of bottleneck latent embeddings. The loss function design incorporates parameters from a psychoacoustic model (PAM) into the frame-wise distortion measure, providing excellent perceptual quality. In addition, a multi-time scale discriminator is utilized to minimize distortions across adjacent frames, reducing artifacts at frame edges. Also, the coder is optimized considering three sophisticated entropy models within the latent domain: the Factorized Entropy Model (FEM), the Hyperprior Model (HPM), and the Joint Hierarchical Model (JHM). Notably, the JHM enhances context modeling across frames to effectively predict components influenced by long-term dependencies. To verify the optimization performance, we conducted extensive experiments using a dataset consisting of commercial movie clips and two additional public datasets. Objective metrics consistently demonstrated that our optimized loss function and latent modeling achieved superior performance across all test datasets compared to traditional codecs such as LAME-MP3 and FDK-AAC. Subjective assessments also indicated that our system could offer comparable or superior auditory quality to FDK-AAC.

关键词： Psychoacoustic models Entropy Encoding Distortion measurement Optimization Distortion Noise neural audio coder optimization psychoacoustic model perceptual loss function entropy model hyperprior

来源：评论

学校读者我要写书评

暂无评论

QUANTIZATION NOISE MASKING IN PERCEPTUAL neural audio coder 49

QUANTIZATION NOISE MASKING IN PERCEPTUAL NEURAL AUDIO CODER

引用

49th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Shin, Seungmin Byun, Joon Sung, Jongmo Beack, Seungkwon Park, Youngcheol Yonsei Univ Intelligent Signal Proc Lab Wonju South Korea Elect & Telecommun Res Inst ETRI Daejeon South Korea

ISBN: (纸本)9798350344868;9798350344851

This study investigates the implication of utilizing the psychoacoustic model (PAM) within the neural audio coder (NAC), specifically focusing on the masking of quantization noise. We introduce a novel training strategy to incorporate the PAM into the NAC more accurately. This method involves a discriminator that directly or indirectly measures the PAM loss. For the indirect measurement, a multi-scale STFT discriminator (MS-STFTD) is incorporated to introduce an auxiliary loss term in addition to the existing PAM loss. Conversely, for the direct measurement, we have designed a multi-scale PAM discriminator (MS-PAMD) that quantifies PAM-specific parameters. Experimental results show that adding the discriminator masks the quantization noise better than the previous NAC, and it obtains audio quality comparable to the commercial AAC in both objective and subjective scores.

关键词： neural audio coder PAM quantization noise masking generative adversarial network

来源：评论

学校读者我要写书评

暂无评论

A Perceptual neural audio coder with a Mean-Scale Hyperprior 48

A Perceptual Neural Audio Coder with a Mean-Scale Hyperprior

引用

48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

作者： Byun, Joon Shin, Seungmin Park, Youngcheol Sung, Jongmo Beack, Seungkwon Yonsei University Intelligent Signal Processing Lab. Wonju Korea Republic of Daejeon Korea Republic of

ISBN: (纸本)9781728163277

This paper proposes an end-to-end neural audio coder based on a mean-scale hyperprior model together with a perceptual optimization using a psychoacoustic model (PAM)-based loss function. The proposed coder estimates the mean and scale hyperpriors using a sub-network after assuming that the probability distribution of latent samples is Gaussian. The main network is an autoencoder based on Resnet-type gated linear units (ResGLUs), each comprising a generalized divisive normalization (GDN) layer. We train both networks to optimize perceptual attributes estimated using a multi-timescale scheme to obtain high perceptual quality. Experimental results show that the proposed model accurately predicts the mean and scale hyperpriors. Also, it obtains consistently higher audio quality than the commercial MP3 audio coder at all bitrates. © 2023 IEEE.

关键词： Hyperprior neural audio coder PAM Perceptual Loss Function

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：