检索结果-内蒙古大学图书馆

IEEE Workshop on Applications of Signal Processing to audio and Acoustics (WASPAA)

作者： Helmrich, Christian R. Edler, Bernd Univ Erlangen Nurnberg Int Audio Labs Erlangen Wolfsmantel 33 D-91058 Erlangen Germany

ISBN: (纸本)9781479974504

Modern stereo and multi-channel perceptual audio codecs utilizing the modified discrete cosine transform (MDCT) can achieve very good overall coding quality even at low bit-rates but lack efficiency on some material with inter-channel phase difference (IPD) of about +/-90 degrees. To address this issue a generalization of the lapped transform coding scheme is proposed which retains the perfect reconstruction property while allowing the usage of three further transform kernels, one of which is the modified discrete sine transform (MDST). Blind listening tests indicate that by frame-wise adaptation of each channel's transform kernel to the instantaneous IPD characteristics, notable gains in coding quality are possible with only negligible increase in decoder complexity and parameter rate.

关键词： audio coding joint stereo MDCT MDST

来源：评论

学校读者我要写书评

暂无评论

A ROBUST SPEECH/MUSIC DISCRIMINATOR FOR SWITCHED audio coding 23

A ROBUST SPEECH/MUSIC DISCRIMINATOR FOR SWITCHED AUDIO CODIN...

引用

23rd European Signal Processing Conference (EUSIPCO)

作者： Fuchs, Guillaume Fraunhofer Inst Integrierte Schaltungen IIS Erlangen Germany

ISBN: (纸本)9780992862633

Switching between speech coding and generic audio coding schemes was recently proven to be very efficient for coding a large range of audio materials at low bit-rates. However, it strongly relies on a robust classification of the input signal. The aim of the paper is to design a reliable speech and music discriminator (SMD) for such an application. Main attention was laid on getting a good tradeoff between accuracy, reactivity and stability of the decision while keeping the delay and complexity reasonably low. To this end, short-term and long-term features are dissociated before being conveyed to two different classifiers. The two classifier outputs are combined in a final decision using a hysteresis. Objective measures show that a more reliable switching decision is achievable. The SMD was successfully implemented in MPEG Unified Speech and audio coding (USAC). It allows the codec to show unprecedented audio quality.

关键词： Speech and Music-Discrimination Speech coding audio coding

来源：评论

学校读者我要写书评

暂无评论

audio/Speech coding using the Matching Pursuit with Frame-Based Psychoacoustic Optimized Time-Frequency Dictionaries and its Performance Evaluation

Audio/Speech Coding using the Matching Pursuit with Frame-Ba...

引用

20th IEEE Conference on Signal Processing - Algorithms, Architectures, Arrangements, and Applications (SPA)

作者： Petrovsky, Alexey Herasimovich, Vadzim Petrovsky, Alexander Belarusian State Univ Informat & Radioelect Dept Comp Engn Minsk BELARUS

ISBN: (纸本)9788362065271

This paper presents an audio/speech coding algorithm using the matching pursuit with the dynamic dictionary forming based on wavelet packet decomposition and its performance evaluation. The proposed methodology for selecting the most relevant wavelet coefficients is based on maximizing the matching between the auditory excitation scalograms associated with the original and the modeled signal correspondingly. The major advantage of this method is that the wavelet packet dictionary is perceptually optimized for each signal segment. It is obtained to reduce the number of the coefficients required to achieve a given perceptual distortion. Flexibility of the presented algorithm gives a possibility to change bitrate depending on the transmission channel limitation. Objective evaluation of the reconstructed audio signals by PEMO-Q model and comparison with the modern popular audio encoders such as Opus and Vorbis is provided. Received results show the high quality of the reconstructed signal of the proposed audio/speech coder.

关键词： audio coding sparse approximation matching pursuit wavelet packet

来源：评论

学校读者我要写书评

暂无评论

SIGNAL-ADAPTIVE SWITCHING OF OVERLAP RATIO IN audio TRANSFORM coding 41

SIGNAL-ADAPTIVE SWITCHING OF OVERLAP RATIO IN AUDIO TRANSFOR...

引用

41st IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

作者： Helmrich, Christian R. Edler, Bernd Friedrich Alexander Univ Erlangen Nurnberg Int Audio Labs Erlangen Wolfsmantel 33 D-91058 Erlangen Germany

ISBN: (纸本)9781479999880

Contemporary perceptual audio coders, all of which apply the modified discrete cosine transform (MDCT), with an overlap ratio of 50%, for frequency-domain quantization, provide good coding quality even at low bit-rates. However, relatively long frames are required for acceptable low-rate performance also for quasi-stationary harmonic input, leading to increased algorithmic latency and reduced temporal coding resolution. This paper investigates the alternative approach of employing the extended lapped transform (ELT), with 75% overlap ratio, on such input. To maintain a high time resolution for coding of transient segments, the ELT definition is modified such that frame-wise switching between ELT (for quasi-stationary) and MDCT coding (for non-stationary or non-tonal regions), with complete time-domain aliasing cancelation and no increase in frame length, becomes possible. A new ELT window function with improved side-lobe rejection to avoid framing artifacts is also derived. Blind subjective evaluation of the switched-ratio proposal confirms the benefit of the signal-adaptive design.

关键词： audio coding lapped transform MDCT

来源：评论

学校读者我要写书评

暂无评论

SPARSE DECOMPOSITION OF audio SIGNALS USING A PERCEPTUAL MEASURE OF DISTORTION. APPLICATION TO LOSSY audio coding 18

SPARSE DECOMPOSITION OF AUDIO SIGNALS USING A PERCEPTUAL MEA...

引用

18th International Conference on Digital audio Effects (DAFx)

作者： Toumi, Ichrak Derrien, Olivier CNRS LMA 31 Chemin Joseph Aiguier F-13402 Marseille 20 France Univ Toulon & Var F-13402 Marseille 20 France

State-of the art audio codecs use time-frequency transforms derived from cosine bases, followed by a quantification stage. The quantization steps are set according to perceptual considerations. In the last decade, several studies applied adaptive sparse time-frequency transforms to audio coding, e.g. on unions of cosine bases using a Matching-Pursuit-derived algorithm [1]. This was shown to significantly improve the coding efficiency. We propose another approach based on a variational algorithm, i.e. the optimization of a cost function taking into account both a perceptual distortion measure derived form a hearing model and a sparsity constraint, which favors the coding efficiency. In this early version, we show that, using a coding scheme without perceptual control of quantization, our method outperforms a codec from the literature with the same quantization scheme [1]. In future work, a more sophisticated quantization scheme would probably allow our method to challenge standard codecs e.g. AAC.

关键词： audio coding Sparse approximation Iterative thresholding algorithm Perceptual model

来源：评论

学校读者我要写书评

暂无评论

Neural Speech coding for Real-Time Communications Using Constant Bitrate Scalar Quantization

引用

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING 2024年第8期18卷 1462-1476页

作者： Brendel, Andreas Pia, Nicola Gupta, Kishan Behringer, Lyonel Fuchs, Guillaume Multrus, Markus Fraunhofer Inst Integrated Circuits IIS Erlangen Fraunhofer IIS D-91058 Erlangen Germany

Neural audio coding has emerged as a vivid research direction by promising good audio quality at very low bitrates unachievable by classical coding techniques. Here, end-to-end trainable autoencoder-like models represent the state of the art, where a discrete representation in the bottleneck of the autoencoder is learned. This allows for efficient transmission of the input audio signal. The learned discrete representation of neural codecs is typically generated by applying a quantizer to the output of the neural encoder. In almost all state-of-the-art neural audio coding approaches, this quantizer is realized as a Vector Quantizer (VQ) and a lot of effort has been spent to alleviate drawbacks of this quantization technique when used together with a neural audio coder. In this paper, we propose and analyze simple alternatives to VQ, which are based on projected Scalar Quantization (SQ). These quantization techniques do not need any additional losses, scheduling parameters or codebook storage thereby simplifying the training of neural audio codecs. For real-time speech communication applications, these neural codecs are required to operate at low complexity, low latency and at low bitrates. We address those challenges by proposing a new causal network architecture that is based on SQ and a Short-Time Fourier Transform (STFT) representation. The proposed method performs particularly well in the very low complexity and low bitrate regime.

关键词： Codecs Bit rate Training Speech coding audio coding Quantization (signal) Complexity theory Vectors Real-time systems Representation learning Discrete representation learning low complexity neural speech coding quantization real-time

来源：评论

学校读者我要写书评

暂无评论

New Highly Efficient Hybrid Lossless audio coding Techniques 1

New Highly Efficient Hybrid Lossless Audio Coding Techniques

引用

1st URSI Atlantic Radio Science Conference (URSI AT-RASC)

作者： Elsayed, Hend A. Delta Univ Sci & Technol Fac Engn Dept Commun & Comp Engn Mansoura Egypt

ISBN: (纸本)9789090086286

In this paper two new highly efficient hybrid lossless audio coding techniques based on the Burrows-Wheeler Transform (BWT) and the distance transform (DT) are presented. In both techniques, floating point samples of the audio signal are first applied to the BWT and the resulting coefficients are then applied to the DT to obtain more suitable coefficients for the next step of lossless compression. In the first proposed method, two entropy-based lossless compression methods are considered, namely Arithmetic coding and Huffman coding. On the other hand, in the second proposed method the entropy coding is first preceded by Run Length Encoding (RLE).

关键词： Burrows-Wheeler Transform Distance transform Run Length Encoding Entropy coding audio coding

来源：评论

学校读者我要写书评

暂无评论

Engaging Students in audiovisual coding Through Interactive MATLAB GUIs

引用

IEEE ACCESS 2025年 13卷 8158-8168页

作者： Cuevas, Carlos Cortes, Carlos Garcia, Narciso Univ Politecnicade Madrid UPM Informat Proc & Telecommun Ctr IPTC Grp Tratamiento Imagenes GTI Madrid 28040 Spain Univ Politecnicade Madrid UPM ETSI Telecomunicac Madrid 28040 Spain

Traditional educational methods in higher education, such as lectures and hands-on exercises, often struggle to fully engage students or address the complexities inherent in audiovisual signal coding. This gap underscores the need for innovative educational tools that not only foster deeper understanding but also enhance student engagement within the university setting. Effective evaluation of these tools requires comprehensive assessment of both learning outcomes and user experience. This study introduces three novel graphical user interface applications designed to strengthen foundational knowledge in audio, image, and video signal coding. These interactive tools, specifically developed for use in university courses, allow students to experiment with various coding techniques, providing hands-on experience that bridges theoretical concepts and practical application. Furthermore, we propose a robust methodology to evaluate the effectiveness of these tools in improving learning outcomes. The tools, developed using MATLAB, were integrated into a computer vision course and assessed through a combination of pre- and post-tests, the System Usability Scale, and the Evaluation Tool for Learning Quality. Results indicate a significant improvement in student knowledge and positive feedback regarding the usability and educational value of the tools. These findings suggest that the interactive nature of the tools not only enhances knowledge retention but also boosts student engagement, offering a valuable complement to traditional educational methods in higher education.

关键词： Image coding Graphical user interfaces Education MATLAB Quantization (signal) audio coding Usability Visualization Signal to noise ratio Programming profession audiovisual coding graphical tools interactive learning MATLAB GUI student engagement teaching effectiveness

来源：评论

学校读者我要写书评

暂无评论

A neural coding method based on feature sensing

引用

IET COMMUNICATIONS 2025年第1期19卷

作者： He, Dongbin Hu, Aiqun Sheng, Kaiwen Southeast Univ Sch Informat Sci & Engn Nanjing Peoples R China Southeast Univ Frontiers Sci Ctr Mobile Informat Commun & Secur Nanjing Peoples R China

The novel network contains many sensors, which greatly heightens data transmission burdens. Some networks require the data perceived by sensors for a period to make decisions. Drawing inspiration from the human neural conduction mechanism, a waveform data encoding method called feature sensing neural coding (FSNC) is proposed to enhance network data transmission efficiency. It involves feature decomposition of information and subsequent non-linear encoding of feature coefficients for data transmission. This approach exploits the unique neuronal responses to diverse stimuli and the inherent non-linear characteristics of human neural coding. Finally, taking the speech signal and seismic wave signal as examples, the effectiveness of FSNC is verified by simulating the auditory nerve conduction process with frequency as a feature according to the mechanism of travelling wave motion of the basilar membrane in the cochlea. Moreover, experiments on seismic waveform signals have demonstrated the wide applicability of FSNC. Compared with traditional speech coding schemes, the FSNC bit rate is only 6.4 kbps, which greatly reduces the amount of data transmitted. Not only that, FSNC also has a certain fault tolerance, and parallel transmission can also greatly increase the transmission rate. This research provides new ideas for efficient data transmission over new networks.

关键词： audio coding encoding source coding

来源：评论

学校读者我要写书评

暂无评论

A QUASI-ORTHOGONAL, INVERTIBLE, AND PERCEPTUALLY RELEVANT TIME-FREQUENCY TRANSFORM FOR audio coding 23

A QUASI-ORTHOGONAL, INVERTIBLE, AND PERCEPTUALLY RELEVANT TI...

引用

23rd European Signal Processing Conference (EUSIPCO)

作者： Derrien, Olivier Necciari, Thibaud Balazs, Peter Univ Toulon & Var Marseille France Lab Mecan & Acoust CNRS Marseille France Acoust Res Inst OAW Vienna Austria

ISBN: (纸本)9780992862633

We describe ERB-MDCT, an invertible real-valued time frequency transform based on MDCT, which is widely used in audio coding (e.g. MP3 and AAC). ERB-MDCT was designed similarly to ERBLet, a recent invertible transform with a resolution evolving across frequency to match the perceptual ERB frequency scale, while the frequency scale in most invertible transforms (e.g. MDCT) is uniform. ERB-MDCT has mostly the same frequency scale as ERBLet, but the main improvement is that atoms are quasi-orthogonal, i.e. its redundancy is close to 1. Furthermore, the energy is more sparse in the time-frequency plane. Thus, it is more suitable for audio coding than ERBLet.

关键词： Non-stationary time-frequency transforms ERB filters MDCT audio coding

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：