Modern stereo and multi-channel perceptual audio codecs utilizing the modified discrete cosine transform (MDCT) can achieve very good overall coding quality even at low bit-rates but lack efficiency on some material w...
详细信息
ISBN:
(纸本)9781479974504
Modern stereo and multi-channel perceptual audio codecs utilizing the modified discrete cosine transform (MDCT) can achieve very good overall coding quality even at low bit-rates but lack efficiency on some material with inter-channel phase difference (IPD) of about +/-90 degrees. To address this issue a generalization of the lapped transform coding scheme is proposed which retains the perfect reconstruction property while allowing the usage of three further transform kernels, one of which is the modified discrete sine transform (MDST). Blind listening tests indicate that by frame-wise adaptation of each channel's transform kernel to the instantaneous IPD characteristics, notable gains in coding quality are possible with only negligible increase in decoder complexity and parameter rate.
Switching between speech coding and generic audio coding schemes was recently proven to be very efficient for coding a large range of audio materials at low bit-rates. However, it strongly relies on a robust classific...
详细信息
ISBN:
(纸本)9780992862633
Switching between speech coding and generic audio coding schemes was recently proven to be very efficient for coding a large range of audio materials at low bit-rates. However, it strongly relies on a robust classification of the input signal. The aim of the paper is to design a reliable speech and music discriminator (SMD) for such an application. Main attention was laid on getting a good tradeoff between accuracy, reactivity and stability of the decision while keeping the delay and complexity reasonably low. To this end, short-term and long-term features are dissociated before being conveyed to two different classifiers. The two classifier outputs are combined in a final decision using a hysteresis. Objective measures show that a more reliable switching decision is achievable. The SMD was successfully implemented in MPEG Unified Speech and audio coding (USAC). It allows the codec to show unprecedented audio quality.
This paper presents an audio/speech coding algorithm using the matching pursuit with the dynamic dictionary forming based on wavelet packet decomposition and its performance evaluation. The proposed methodology for se...
详细信息
ISBN:
(纸本)9788362065271
This paper presents an audio/speech coding algorithm using the matching pursuit with the dynamic dictionary forming based on wavelet packet decomposition and its performance evaluation. The proposed methodology for selecting the most relevant wavelet coefficients is based on maximizing the matching between the auditory excitation scalograms associated with the original and the modeled signal correspondingly. The major advantage of this method is that the wavelet packet dictionary is perceptually optimized for each signal segment. It is obtained to reduce the number of the coefficients required to achieve a given perceptual distortion. Flexibility of the presented algorithm gives a possibility to change bitrate depending on the transmission channel limitation. Objective evaluation of the reconstructed audio signals by PEMO-Q model and comparison with the modern popular audio encoders such as Opus and Vorbis is provided. Received results show the high quality of the reconstructed signal of the proposed audio/speech coder.
Contemporary perceptual audio coders, all of which apply the modified discrete cosine transform (MDCT), with an overlap ratio of 50%, for frequency-domain quantization, provide good coding quality even at low bit-rate...
详细信息
ISBN:
(纸本)9781479999880
Contemporary perceptual audio coders, all of which apply the modified discrete cosine transform (MDCT), with an overlap ratio of 50%, for frequency-domain quantization, provide good coding quality even at low bit-rates. However, relatively long frames are required for acceptable low-rate performance also for quasi-stationary harmonic input, leading to increased algorithmic latency and reduced temporal coding resolution. This paper investigates the alternative approach of employing the extended lapped transform (ELT), with 75% overlap ratio, on such input. To maintain a high time resolution for coding of transient segments, the ELT definition is modified such that frame-wise switching between ELT (for quasi-stationary) and MDCT coding (for non-stationary or non-tonal regions), with complete time-domain aliasing cancelation and no increase in frame length, becomes possible. A new ELT window function with improved side-lobe rejection to avoid framing artifacts is also derived. Blind subjective evaluation of the switched-ratio proposal confirms the benefit of the signal-adaptive design.
State-of the art audio codecs use time-frequency transforms derived from cosine bases, followed by a quantification stage. The quantization steps are set according to perceptual considerations. In the last decade, sev...
详细信息
State-of the art audio codecs use time-frequency transforms derived from cosine bases, followed by a quantification stage. The quantization steps are set according to perceptual considerations. In the last decade, several studies applied adaptive sparse time-frequency transforms to audio coding, e.g. on unions of cosine bases using a Matching-Pursuit-derived algorithm [1]. This was shown to significantly improve the coding efficiency. We propose another approach based on a variational algorithm, i.e. the optimization of a cost function taking into account both a perceptual distortion measure derived form a hearing model and a sparsity constraint, which favors the coding efficiency. In this early version, we show that, using a coding scheme without perceptual control of quantization, our method outperforms a codec from the literature with the same quantization scheme [1]. In future work, a more sophisticated quantization scheme would probably allow our method to challenge standard codecs e.g. AAC.
Neural audio coding has emerged as a vivid research direction by promising good audio quality at very low bitrates unachievable by classical coding techniques. Here, end-to-end trainable autoencoder-like models repres...
详细信息
Neural audio coding has emerged as a vivid research direction by promising good audio quality at very low bitrates unachievable by classical coding techniques. Here, end-to-end trainable autoencoder-like models represent the state of the art, where a discrete representation in the bottleneck of the autoencoder is learned. This allows for efficient transmission of the input audio signal. The learned discrete representation of neural codecs is typically generated by applying a quantizer to the output of the neural encoder. In almost all state-of-the-art neural audio coding approaches, this quantizer is realized as a Vector Quantizer (VQ) and a lot of effort has been spent to alleviate drawbacks of this quantization technique when used together with a neural audio coder. In this paper, we propose and analyze simple alternatives to VQ, which are based on projected Scalar Quantization (SQ). These quantization techniques do not need any additional losses, scheduling parameters or codebook storage thereby simplifying the training of neural audio codecs. For real-time speech communication applications, these neural codecs are required to operate at low complexity, low latency and at low bitrates. We address those challenges by proposing a new causal network architecture that is based on SQ and a Short-Time Fourier Transform (STFT) representation. The proposed method performs particularly well in the very low complexity and low bitrate regime.
In this paper two new highly efficient hybrid lossless audio coding techniques based on the Burrows-Wheeler Transform (BWT) and the distance transform (DT) are presented. In both techniques, floating point samples of ...
详细信息
ISBN:
(纸本)9789090086286
In this paper two new highly efficient hybrid lossless audio coding techniques based on the Burrows-Wheeler Transform (BWT) and the distance transform (DT) are presented. In both techniques, floating point samples of the audio signal are first applied to the BWT and the resulting coefficients are then applied to the DT to obtain more suitable coefficients for the next step of lossless compression. In the first proposed method, two entropy-based lossless compression methods are considered, namely Arithmetic coding and Huffman coding. On the other hand, in the second proposed method the entropy coding is first preceded by Run Length Encoding (RLE).
Traditional educational methods in higher education, such as lectures and hands-on exercises, often struggle to fully engage students or address the complexities inherent in audiovisual signal coding. This gap undersc...
详细信息
Traditional educational methods in higher education, such as lectures and hands-on exercises, often struggle to fully engage students or address the complexities inherent in audiovisual signal coding. This gap underscores the need for innovative educational tools that not only foster deeper understanding but also enhance student engagement within the university setting. Effective evaluation of these tools requires comprehensive assessment of both learning outcomes and user experience. This study introduces three novel graphical user interface applications designed to strengthen foundational knowledge in audio, image, and video signal coding. These interactive tools, specifically developed for use in university courses, allow students to experiment with various coding techniques, providing hands-on experience that bridges theoretical concepts and practical application. Furthermore, we propose a robust methodology to evaluate the effectiveness of these tools in improving learning outcomes. The tools, developed using MATLAB, were integrated into a computer vision course and assessed through a combination of pre- and post-tests, the System Usability Scale, and the Evaluation Tool for Learning Quality. Results indicate a significant improvement in student knowledge and positive feedback regarding the usability and educational value of the tools. These findings suggest that the interactive nature of the tools not only enhances knowledge retention but also boosts student engagement, offering a valuable complement to traditional educational methods in higher education.
The novel network contains many sensors, which greatly heightens data transmission burdens. Some networks require the data perceived by sensors for a period to make decisions. Drawing inspiration from the human neural...
详细信息
The novel network contains many sensors, which greatly heightens data transmission burdens. Some networks require the data perceived by sensors for a period to make decisions. Drawing inspiration from the human neural conduction mechanism, a waveform data encoding method called feature sensing neural coding (FSNC) is proposed to enhance network data transmission efficiency. It involves feature decomposition of information and subsequent non-linear encoding of feature coefficients for data transmission. This approach exploits the unique neuronal responses to diverse stimuli and the inherent non-linear characteristics of human neural coding. Finally, taking the speech signal and seismic wave signal as examples, the effectiveness of FSNC is verified by simulating the auditory nerve conduction process with frequency as a feature according to the mechanism of travelling wave motion of the basilar membrane in the cochlea. Moreover, experiments on seismic waveform signals have demonstrated the wide applicability of FSNC. Compared with traditional speech coding schemes, the FSNC bit rate is only 6.4 kbps, which greatly reduces the amount of data transmitted. Not only that, FSNC also has a certain fault tolerance, and parallel transmission can also greatly increase the transmission rate. This research provides new ideas for efficient data transmission over new networks.
We describe ERB-MDCT, an invertible real-valued time frequency transform based on MDCT, which is widely used in audio coding (e.g. MP3 and AAC). ERB-MDCT was designed similarly to ERBLet, a recent invertible transform...
详细信息
ISBN:
(纸本)9780992862633
We describe ERB-MDCT, an invertible real-valued time frequency transform based on MDCT, which is widely used in audio coding (e.g. MP3 and AAC). ERB-MDCT was designed similarly to ERBLet, a recent invertible transform with a resolution evolving across frequency to match the perceptual ERB frequency scale, while the frequency scale in most invertible transforms (e.g. MDCT) is uniform. ERB-MDCT has mostly the same frequency scale as ERBLet, but the main improvement is that atoms are quasi-orthogonal, i.e. its redundancy is close to 1. Furthermore, the energy is more sparse in the time-frequency plane. Thus, it is more suitable for audio coding than ERBLet.
暂无评论