检索结果-内蒙古大学图书馆

Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural audio coding

IEEE SIGNAL PROCESSING LETTERS 2020年 27卷 2159-2163页

作者： Zhen, Kai Lee, Mi Suk Sung, Jongmo Beack, Seungkwon Kim, Minje Indiana Univ Dept Comp Sci Bloomington IN 47408 USA Indiana Univ Cognit Sci Program Bloomington IN 47408 USA Elect & Telecommun Res Inst Daejeon 34129 South Korea Indiana Univ Dept Intelligent Syst Engn Bloomington IN 47408 USA

Conventional audio coding technologies commonly leverage human perception of sound, or psychoacoustics, to reduce the bitrate while preserving the perceptual quality of the decoded audio signals. For neural audio codecs, however, the objective nature of the loss function usually leads to suboptimal sound quality as well as high run-time complexity due to the large model size. In this work, we present a psychoacoustic calibration scheme to re-define the loss functions of neural audio coding systems so that it can decode signals more perceptually similar to the reference, yet with a much lower model complexity. The proposed loss function incorporates the global masking threshold, allowing the reconstruction error that corresponds to inaudible artifacts. Experimental results show that the proposed model outperforms the baseline neural codec twice as large and consuming 23.4% more bits per second. With the proposed method, a lightweight neural codec, with only 0.9 million parameters, performs near-transparent audio coding comparable with the commercial MPEG-1 audio Layer III codec at 112 kbps.

关键词： Decoding Masking threshold Psychoacoustics Bit rate Quantization (signal) Kernel audio coding audio coding deep neural networks psychoacoustics network compression

来源：评论

学校读者我要写书评

暂无评论

ENHANCED METHOD OF audio coding USING CNN-BASED SPECTRAL RECOVERY WITH ADAPTIVE STRUCTURE

ENHANCED METHOD OF AUDIO CODING USING CNN-BASED SPECTRAL REC...

引用

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Shin, Seong-Hyeon Beack, Seung Kwon Lim, Wootaek Park, Hochong Kwangwoon Univ Seoul South Korea Elect & Telecommun Res Inst Daejeon South Korea

ISBN: (纸本)9781509066315

A process of spectral recovery can enhance the performance of transform-based audio coding by transmitting only a portion of spectral data and recovering the missing spectral data in the decoder. This study proposes an enhanced method of audio coding based on spectral recovery with an adaptive structure that yields improved sound quality compared with the previous method. The spectral data to be recovered are arranged in an adaptive pattern depending on the difficulty of recovery. In addition, according to the spectral characteristics, prior information associated with these spectral data is selectively transmitted that helps a neural network improve the performance of magnitude recovery. Prior information also provides the signs of recovered magnitudes. A subjective performance evaluation shows that, for mono coding without window switching at 40 kbps, the proposed coding method provides better sound quality than the conventional method on average.

关键词： adaptive structure audio coding autoencoder convolutional neural network spectral recovery

来源：评论

学校读者我要写书评

暂无评论

audio coding via EMD

引用

DIGITAL SIGNAL PROCESSING 2020年 104卷 102770-102770页

作者： Boudraa, Abdel-Ouahab Khaldi, Kais Chonavel, Thierry Hadj-Alouane, Mounia Turki Komaty, Ali Ecole Navale Arts & Metiers Inst Technol IRENav CC 600 F-29240 Brest 9 France Jouf Univ Coll Sci & Arts Tabarjal POB 2014 Al Jouf 42421 Sakaka Saudi Arabia IMT Atlantique LabSTICC UMR 6285 BP 832 F-29285 Brest France El Manar Univ ENIT U2S BP 37 Tunis 1002 Tunisia Univ Sci & Arts USAL Lebanon Beirut Lebanon

In this paper an audio coding scheme based on the empirical mode decomposition in association with a psychoacoustic model is presented. The principle of the method consists in breaking down adaptively the audio signal into intrinsic oscillatory components, called Intrinsic Mode Functions (IMFs), that are fully described by their local extrema. These extrema are encoded. The coding is carried out frame by frame and no assumption is made upon the signal to be coded. The number of allocated bits varies from mode to mode and obeys to the coding error inaudibility constraint. Due to the symmetry of an IMF, only the extrema (maxima or minima) of one of its interpolating envelopes are perceptually coded. In addition, to deal with rapidly changing audio signals, a stationarity index is used and when a transient is detected, the frame is split into two overlapping sub-frames. At the decoder side, the IMFs are recovered using the associated coded maxima, and the original signal is reconstructed by IMFs summation. Performance of the proposed coding is analyzed and compared to that of MP3 and AAC codecs, and the wavelet-based coding approach. Based on the analyzed mono audio signals, the obtained results show that the proposed coding scheme outperforms the MP3 and the wavelet-based coding methods and performs slightly better than the AAC codec, showing thus the potential of the EMD for data-driven audio coding. (C) 2020 Elsevier Inc. All rights reserved.

关键词： Empirical mode decomposition Empirical mode compression audio coding Sub-band coding Stationarity index Psychoacoustic model

来源：评论

学校读者我要写书评

暂无评论

Low Delay Robust audio coding by Noise Shaping, Fractional Sampling, and Source Prediction

Low Delay Robust Audio Coding by Noise Shaping, Fractional S...

引用

Data Compression Conference (DCC)

作者： Jan Østergaard Aalborg University Aalborg Denmark

It was recently shown that the combination of source prediction, two-times oversampling, and noise shaping, can be used to obtain a robust (multiple-description) audio coding framework for networks with packet loss probabilities less than 10%. Specifically, it was shown that audio signals could be encoded into two descriptions (packets), which were separately sent over a communication channel. Each description yields a desired performance by itself, and when they are combined, the performance is improved. This paper extends the previous work to an arbitrary number of descriptions (packets) by using fractional oversampling and a new decoding principle. We demonstrate that, due to source aliasing, existing MSE optimized reconstruction rules from noisy sampled data, performs poorly from a perceptual point of view. A simple reconstruction rule is proposed, that improves the PEAQ objective difference grades (ODG) by more than 2 points. The proposed audio coder enables low-delay high-quality audio streaming on networks with late packet arrivals or packet losses. With a coding delay of 2.5 ms, and a total bitrate of 300 kbps, it is demonstrated that mean PEAQ ODGs around -0.65 can be obtained for 48 kHz (mono) music (pop & rock), and packet loss probabilities of 20%.

关键词： audio coding Bit rate Packet loss Data compression Communication channels Rocks Noise shaping

来源：评论

学校读者我要写书评

暂无评论

audio coding BASED ON SPECTRAL RECOVERY BY CONVOLUTIONAL NEURAL NETWORK 44

AUDIO CODING BASED ON SPECTRAL RECOVERY BY CONVOLUTIONAL NEU...

引用

44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

作者： Shin, Seong-Hyeon Beack, Seung Kwon Lee, Taejin Park, Hochong Kwangwoon Univ Seoul South Korea Elect & Telecommun Res Inst Daejeon South Korea

ISBN: (纸本)9781479981311

This study proposes a new method of audio coding based on spectral recovery, which can enhance the performance of transform audio coding. An encoder represents spectral information of an input in a time-frequency domain and transmits only a portion of it so that the remaining spectral information can be recovered based on the transmitted information. A decoder recovers the magnitudes of missing spectral information using a convolutional neural network. The signs of missing spectral information are either transmitted or randomly assigned, according to their importance. By combining transmission and recovery of spectral information, the proposed method can enhance the coding performance, compared with conventional transform coding. The subjective performance evaluation shows that, for mono coding at 39.4 kbps, the proposed method provides higher sound quality than the USAC, by an average MUSHRA score of 8.5.

关键词： audio coding convolutional neural network spectral recovery transform coding

来源：评论

学校读者我要写书评

暂无评论

IMMERSIVE audio coding FOR VIRTUAL REALITY USING A METADATA-ASSISTED EXTENSION OF THE 3GPP EVS CODEC 44

IMMERSIVE AUDIO CODING FOR VIRTUAL REALITY USING A METADATA-...

引用

44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

作者： McGrath, D. Bruhn, S. Purnhagen, H. Eckert, M. Torres, J. Brown, S. Darcy, D. Dolby Australia Pty Ltd Sydney NSW Australia Dolby Sweden AB Stockholm Sweden Dolby Labs Inc San Francisco CA USA

ISBN: (纸本)9781479981311

Virtual Reality (VR) audio scenes may be composed of a very large number of audio elements, including dynamic audio objects, fixed audio channels and scene-based audio elements such as Higher Order Ambisonics (HOA). Potentially, the subjective listening experience may be replicated using a compact spatial format with a set number of dynamic objects and scene-based elements, retaining only the perceptual essence of the audio scene. The compact format would further enable a reduction in the complexity of subsequent compression and rendering. This paper investigates these hypotheses by exploring the use of a compact format that consists of up to four dynamic objects and nine HOA channels, with the Enhanced Voice Services (EVS) codec being applied to a 4-channel down-mix of the compact format.

关键词： audio coding Virtual Reality Spatial audio Immersive audio Ambisonics

来源：评论

学校读者我要写书评

暂无评论

The Novel Improving Algorithms on DRA audio Entropy coding

The Novel Improving Algorithms on DRA Audio Entropy Coding

引用

IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC)

作者： Jianxin Yan Lei Wang National Engineering Laboratory for Digital Audio Coding Technology Guangzhou China

ISBN: (数字)9781728172026

ISBN: (纸本)9781728172033

DRA (Digital Rise audio) is referred to as specification for multi-channel digital audio coding technology which was issued as a Chinese national standard in 2008. There novel algorithms which are the differential Huffman, context-based Huffman and context-based arithmetic entropy coding, respectively, are presented to improve the compression efficiency of DRA entropy coding with 3%, 11.11% and 13.52% at the expense of the different implementing complexities and the intra-frame and/or inter-frame error propagation in live transmission.

关键词： Entropy coding Encoding Transform coding audio coding Huffman coding Entropy Redundancy

来源：评论

学校读者我要写书评

暂无评论

Encoding Multichannel audio for Ultra HDTV Based on Spatial audio coding with Optimization

Encoding Multichannel Audio for Ultra HDTV Based on Spatial ...

引用

IEEE Region 10 Symposium

作者： Ikhwana Elfitri Doni Nursyam Fitrilina Rahmadi Kurnia Dept. of Electrical Eng. Andalas University Padang Indonesia

ISBN: (纸本)9781538669907;9781538669891

Ultra High Definition Television (UHDTV) has been a promising system for future TV broadcasting where 22.2 multi-channel audio is adopted for creating three-dimensional (3D) audio in the home-user listening area. In the development stage, Moving Picture Expert Group (MPEG) Advanced audio coding (AAC) standard has been chosen for audio encoding. However, this choice is questionable as its performance deteriorates as the bit-rate decreases. In this paper, Spatial audio coding (SAC) with optimization technique is proposed due to its strength on working at lower bit-rates and its backward compatibility to fewer multi-channel configurations. The results of experiments show that the proposed method achieves Objective Difference Grade (ODG) score more than -1, which is rated as excellent score, at bit-rates starting from 1200 kb/s.

关键词： Bit rate Bars audio coding Transform coding Optimization IEEE Regions

来源：评论

学校读者我要写书评

暂无评论

Dual-band PWM audio coding for Ultrasound Artefact Reduction 25

Dual-band PWM Audio Coding for Ultrasound Artefact Reduction

引用

25th European Signal Processing Conference (EUSIPCO)

作者： Kaleris, Konstantinos Mourjopoulos, John Univ Patras Elect & Comp Engn Dept Audio & Acoust Technol Grp Patras 26500 Greece

ISBN: (纸本)9780992862671

A novel approach to PWM coding is introduced based on generating two complementary PWM streams out of the in-band signal's spectrum. The 2 streams are then recombined in a suitable way, so that out-of-phase cancellation of the carrier frequency harmonics is achieved. The approach suppresses the strong out-of-band frequencies of the carrier signal without introducing distortion of the in-band coded signal. Such method can achieve superior reduction compared to the out-of-band artefact suppression induced by traditional analog low-pass filters employed in typical Class-D audio amplifiers or other switching power delivery systems, hence allowing designs with reduced filter requirements or even filterless implementations.

关键词： Pulse Width Modulation Class-D audio amplifier audio coding PWM ultrasound

来源：评论

学校读者我要写书评

暂无评论

audio coding Using Overlap and Kernel Adaptation

引用

IEEE SIGNAL PROCESSING LETTERS 2016年第5期23卷 589-593页

作者： Helmrich, Christian R. Edler, Bernd Fraunhofer IIS Int Audio Labs Erlangen D-91058 Erlangen Germany Univ Erlangen Nurnberg D-91058 Erlangen Germany

Perceptual audio coding schemes typically apply the modified discrete cosine transform (MDCT) with different lengths and windows, and utilize signal-adaptive switching between these on a perframe basis for best subjective performance. In previous papers, the authors demonstrated that further quality gains can be achieved for some input signals using additional transform kernels such as the modified discrete sine transform (MDST) or greater inter-transform overlap by means of a modified extended lapped transform (MELT). This work discusses the algorithmic procedures and codec modifications necessary to combine all of the above features-transform length, window shape, transform kernel, and overlap ratio switching-into a flexible input-adaptive coding system. It is shown that, due to full time-domain aliasing cancelation, this system supports perfect signal reconstruction in the absence of quantization and, thanks to fast realizations of all transforms, increases the codec complexity only negligibly. The results of a 5.1 multichannel listening test are also reported.

关键词： audio coding lapped transform modified discrete cosine transform (MDCT) modified discrete sine transform (MDST)

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：