检索结果-内蒙古大学图书馆

2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025

作者： Vasilache, Adriana Pihlajakuja, Tapani Laitinen, Mikko-Ville Nokia Technologies Tampere Finland Nokia Technologies Espoo Finland

ISBN: (纸本)9798350368741

The 3GPP Immersive Voice and audio Services (IVAS) codec enables mobile spatial communication through coding of the metadata-assisted spatial audio (MASA) format. The MASA format is a new parametric audio format designed for spatial audio capture and representation from varying mobile device microphone arrays. MASA format is composed of one or two audio transport channels and associated spatial metadata. The amount of uncompressed data in MASA can be up to almost 2 Mbps including the 422.4 kbps of spatial metadata. As the IVAS codec supports bitrates ranging from 13.2 kbps to 512 kbps, the compression rate goes from as low as 0.7% to 25%. This paper presents the flexible coding flow of the MASA format in IVAS to support the entire bitrate range. It also describes the bitrate allocation between transport channels and metadata, and the techniques employed for reducing the metadata prior to coding while maintaining perceptually representative information. The encoding performance is demonstrated with both objective and subjective measures. © 2025 IEEE.

关键词： 3GPP IVAS audio coding MASA Metadata Spatial audio

来源：评论

学校读者我要写书评

暂无评论

OpenACE: An Open Benchmark for Evaluating audio coding Performance

OpenACE: An Open Benchmark for Evaluating Audio Coding Perfo...

引用

2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025

作者： Coldenhoff, Jozef Granqvist, Niclas Cernak, Milos Logitech Europe S.A. Lausanne Switzerland

ISBN: (纸本)9798350368741

audio and speech coding lack unified evaluation and open-source testing. Many candidate systems were evaluated on proprietary, non-reproducible, or small data, and machine learning-based codecs are often tested on datasets with similar distributions as trained on, which is unfairly compared to digital signal processing-based codecs that usually work well with unseen data. This paper presents a full-band audio and speech coding quality benchmark with more variable content types, including traditional open test vectors. An example use case of audio coding quality assessment is presented with open-source Opus, 3GPP's EVS, and recent ETSI's LC3 with LC3+ used in Bluetooth LE audio profiles. Besides, quality variations of emotional speech encoding at 16 kbps are shown. The proposed open-source benchmark contributes to audio and speech coding democratization and is available at https://***/JozefColdenhoff/OpenACE. © 2025 IEEE.

关键词： audio coding benchmarks deep learning speech processing

来源：评论

学校读者我要写书评

暂无评论

DRA audio coding Standard

引用

Chinese Journal of Electronics 2025年第3期23卷 521-526页

作者： Wenhua Ma Jing Xu Yuanzhe Ma Yuli You Department of Computers Cisco School of Informatics Guangdong University of Foreign Studies Guangzhou China Department of Science and Technology Guangdong Rising Assets Management Co. Guangzhou China Department of Biomedical Science and Technology South China University of Technology Guangzhou China Guangdong Provincial Key Laboratary for Digital Audio Technology Guangzhou China

DRA (Dynamic resolution adaptation) audio coding standard was shown to deploy transient-localized MDCT to effectively suppress pre-echo artifacts and statistic allocation of codebooks to improve the compression efficiency of Huffman coding. Its quantizers and Huffman codebooks are designed in such a way that a signal path of 24 bits is provided throughout the codec so that high audio quality can be delivered if bit rate suffices. Although simple, it delivers state-of-the-arts compression efficiency as shown by five rounds of ITU-R BS.11116 compliant subjective listening tests.

关键词： Codecs audio coding Bit rate Dynamic scheduling Resource management Discrete cosine transforms Huffman coding Standards Signal resolution Commercialization

来源：评论

学校读者我要写书评

暂无评论

FlowMAC: Conditional Flow Matching for audio coding at Low Bit Rates

FlowMAC: Conditional Flow Matching for Audio Coding at Low B...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Nicola Pia Martin Strauss Markus Multrus Bernd Edler Fraunhofer IIS Erlangen Germany International Audio Laboratories Erlangen Erlangen Germany

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

This paper introduces FlowMAC, a novel neural audio codec for high-quality general audio compression at low bit rates based on conditional flow matching (CFM). FlowMAC jointly learns a mel spectrogram encoder, quantizer and decoder. At inference time the decoder integrates a continuous normalizing flow via an ODE solver to generate a high-quality mel spectrogram. This is the first time that a CFM-based approach is applied to general audio coding, enabling a scalable, simple and memory efficient training. Our subjective evaluations show that FlowMAC at 3 kbps achieves similar quality as state-of-the-art GAN-based and DDPM-based neural audio codecs at double the bit rate. Moreover, FlowMAC offers a tunable inference pipeline, which permits to trade off complexity and quality. This enables real-time coding on CPU, while maintaining high perceptual quality.

关键词： Training Codecs audio coding Speech coding Bit rate Real-time systems Decoding Complexity theory Speech processing Spectrogram

来源：评论

学校读者我要写书评

暂无评论

OpenACE: An Open Benchmark for Evaluating audio coding Performance

OpenACE: An Open Benchmark for Evaluating Audio Coding Perfo...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Jozef Coldenhoff Niclas Granqvist Milos Cernak Logitech Europe S.A. Lausanne Switzerland

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

audio and speech coding lack unified evaluation and open-source testing. Many candidate systems were evaluated on proprietary, non-reproducible, or small data, and machine learning-based codecs are often tested on datasets with similar distributions as trained on, which is unfairly compared to digital signal processing-based codecs that usually work well with unseen data. This paper presents a full-band audio and speech coding quality benchmark with more variable content types, including traditional open test vectors. An example use case of audio coding quality assessment is presented with open-source Opus, 3GPP’s EVS, and recent ETSI’s LC3 with LC3+ used in Bluetooth LE audio profiles. Besides, quality variations of emotional speech encoding at 16 kbps are shown. The proposed open-source benchmark contributes to audio and speech coding democratization and is available at https://***/JozefColdenhoff/OpenACE.

关键词： Codecs Speech coding audio coding Working environment noise Benchmark testing Data augmentation Data models Vectors Reverberation Speech processing

来源：评论

学校读者我要写书评

暂无评论

Perceptual audio coding: A 40-Year Historical Perspective

Perceptual Audio Coding: A 40-Year Historical Perspective

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Jürgen Herre Schuyler Quackenbush Minje Kim Jan Skoglund International Audio Laboratories Erlangen Erlangen Germany Audio Research Labs Westfield NJ University of Illinois at Urbana-Champaign Urbana IL Google LLC San Francisco CA

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

In the history of audio and acoustic signal processing, perceptual audio coding has certainly excelled as a bright success story by its ubiquitous deployment in virtually all digital media devices, such as computers, tablets, mobile phones, set-top-boxes, and digital radios. From a technology perspective, perceptual audio coding has undergone tremendous development from the first very basic perceptually driven coders (including the popular mp3 format) to today’s full-blown integrated coding/rendering systems. This paper provides a historical overview of this research journey by pinpointing the pivotal development steps in the evolution of perceptual audio coding. Finally, it provides thoughts about future directions in this area.

关键词： Computers audio coding Speech coding Media Digital communication Mobile handsets Acoustics History Speech processing Digital audio players

来源：评论

学校读者我要写书评

暂无评论

Variable Bitrate Residual Vector Quantization for audio coding

Variable Bitrate Residual Vector Quantization for Audio Codi...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Yunkee Chae Woosung Choi Yuhta Takida Junghyun Koo Yukara Ikemiya Zhi Zhong Kin Wai Cheuk Marco A. Martínez-Ramírez Kyogu Lee Wei-Hsiang Liao Yuki Mitsufuji Sony AI IPAI Sony Group Corporation Tokyo Japan AIIS Department of Intelligence and Information Seoul National University

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Recent state-of-the-art neural audio compression models have progressively adopted residual vector quantization (RVQ). Despite this success, these models employ a fixed number of codebooks per frame, which can be suboptimal in terms of rate-distortion tradeoff, particularly in scenarios with simple input audio, such as silence. To address this limitation, we propose variable bitrate RVQ (VRVQ) for audio codecs, which allows for more efficient coding by adapting the number of codebooks used per frame. Furthermore, we propose a gradient estimation method for the non-differentiable masking operation that transforms from the importance map to the binary importance mask, improving model training via a straight-through estimator. We demonstrate that the proposed training framework achieves superior results compared to the baseline method and shows further improvement when applied to the current state-of-the-art codec. audio samples are available at: https://***/***/

关键词： Training Adaptation models Codecs audio coding Vector quantization Bit rate Rate-distortion Estimation Transforms Vectors

来源：评论

学校读者我要写书评

暂无评论

audio coding using transform-domain weighted interleave vector quantization (Twin VQ)

引用

ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE 1998年第3期81卷 1-9页

作者： Iwakami, N Moriya, T Miki, S Ikeda, K Jin, A NTT Corp Human Interface Labs Musashino Tokyo 180 Japan

Twin VQ (transform-domain weighted interleave vector quantization) is a method that encodes the wideband acoustic signal with a low bit rate. It is transform coding with a basic structure that transforms the input signal to the frequency domain by MDCT;vector quantization is applied after flattening. This encoding method has characteristic features such as weighted interleave vector quantization, normalization of the frequency characteristics by the linearly predicted spectrum, and interframe prediction in the frequency domain. Especially, high performance is realized for lower bit rates. Another feature is robustness against the error, since adaptive bit assignment is not applied. (C) 1998 Scripta Technica.

关键词： audio coding transform coding vector quantization linear prediction

来源：评论

学校读者我要写书评

暂无评论

audio coding and image denoising based on the nonuniform modulated complex lapped transform

引用

IEEE TRANSACTIONS ON MULTIMEDIA 2005年第5期7卷 817-827页

作者： Cheng, S Xiong, ZX Texas A&M Univ Dept Elect Engn College Stn TX 77840 USA

Xiong and Malvar recently introduced a nonuniform modulated complex lapped transform (NMCLT) with good time-localization and controllable frequency resolution by using an oversampled nonuniform filter bank to generate its real and the imaginary components. In this paper, we first show that oversampling in the NMCLT is not necessary in theory but a by-product of fast implementation in practice. We also point out that the amount of oversampling, which can be flexibly controlled, depends on the application. We then describe in detail the implementation of the inverse transform, which was not addressed clearly by Xiong and Malvar. We present the first applications of the NMCLT to audio coding and image denoising. A scalable audio coder has been implemented by controlling the amount of oversampling and exploiting redundancy among the NMCLT coefficients via predictive coding. Experimental results show that the audio coder reduces pre-echoes and improves the sound quality of audio clips with transient sounds. A simple denoising algorithm based on the NMCLT has also been devised to provide images with better visual quality than those obtained with wavelet-based soft thresholding.

关键词： audio coding image denoising lapped orthogonal transform modulated lapped transform

来源：评论

学校读者我要写书评

暂无评论

audio coding Using Overlap and Kernel Adaptation

引用

IEEE SIGNAL PROCESSING LETTERS 2016年第5期23卷 589-593页

作者： Helmrich, Christian R. Edler, Bernd Fraunhofer IIS Int Audio Labs Erlangen D-91058 Erlangen Germany Univ Erlangen Nurnberg D-91058 Erlangen Germany

Perceptual audio coding schemes typically apply the modified discrete cosine transform (MDCT) with different lengths and windows, and utilize signal-adaptive switching between these on a perframe basis for best subjective performance. In previous papers, the authors demonstrated that further quality gains can be achieved for some input signals using additional transform kernels such as the modified discrete sine transform (MDST) or greater inter-transform overlap by means of a modified extended lapped transform (MELT). This work discusses the algorithmic procedures and codec modifications necessary to combine all of the above features-transform length, window shape, transform kernel, and overlap ratio switching-into a flexible input-adaptive coding system. It is shown that, due to full time-domain aliasing cancelation, this system supports perfect signal reconstruction in the absence of quantization and, thanks to fast realizations of all transforms, increases the codec complexity only negligibly. The results of a 5.1 multichannel listening test are also reported.

关键词： audio coding lapped transform modified discrete cosine transform (MDCT) modified discrete sine transform (MDST)

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：