We address the problem of integrating directional analysis of sound into the filterbank of a spatial audio coder, with the purpose of processing and coding with some degree of independence the plane waves traveling in...
详细信息
We address the problem of integrating directional analysis of sound into the filterbank of a spatial audio coder, with the purpose of processing and coding with some degree of independence the plane waves traveling in different directions. A plane wave represents an elementary waveform in the spatio-temporal analysis of the sound field, the same way a complex exponential is an elementary waveform in the time domain analysis of signals. Since a two-dimensional separable filterbank is not flexible enough for this purpose, we propose a non-separable approach based on the quincunx filterbank with diamond-shaped filters, cascaded with a base transform filterbank. This solution provides an invertible and critically sampled decomposition of the spatio-temporal spectra into subbands representing the different directions of wave propagation.
In this work, we develop a new method for quantization in multistage audio coding. Given a (perceptual) distortion measure and a bit-rate constraint, we analytically derive the optimal rate distribution between subcod...
详细信息
In this work, we develop a new method for quantization in multistage audio coding. Given a (perceptual) distortion measure and a bit-rate constraint, we analytically derive the optimal rate distribution between subcoders (stages) and the corresponding optimal quantizers using high-rate theory. The analytical solutions for optimal quantizers allow a coder to easily adapt to changes in bit-rate requirements. As an illustration of the new method, we consider quantization in a two-stage sinusoidal/wave form coder that is a widely used combination in audio coding. We show that at low total rates most of the rate should be assigned to the sinusoidal (model-based, subspace) subcoder, while at high total rates most of the rate should be assigned to the waveform (full-space) subcoder. We compare the new method to a reference quantization method that does not use rate-distortion optimization. A significantly higher performance of the new method is shown by means of a listening test.
Recent trends in speech and audio codec standardization include scalability and extending the signal bandwidth beyond wideband (WB) to superwideband (SWB). In this paper we introduce a SWB extension for the ITU-T G.71...
详细信息
ISBN:
(纸本)9781424423538
Recent trends in speech and audio codec standardization include scalability and extending the signal bandwidth beyond wideband (WB) to superwideband (SWB). In this paper we introduce a SWB extension for the ITU-T G.718 WB codec. In the SWB extension the high frequency content is generated utilizing the quantized MDCT domain coefficients of the WB core, which enables low additional delay. The proposed implementation is scalable with 4 kbps layers. In the first layer two different coding modes are used depending on the input signal type. The proposed SWB extension is evaluated with listening tests and complexity analysis.
A new algorithm for the modulated complex lapped transform (MCLT) with a sine windowing function is presented. It is shown that by merging the windowing operation with the main computation, both the real and the imagi...
详细信息
A new algorithm for the modulated complex lapped transform (MCLT) with a sine windowing function is presented. It is shown that by merging the windowing operation with the main computation, both the real and the imaginary parts of the MCLT with 2N inputs can be obtained from two N-point discrete cosine transforms of type II (DCTs-II) of appropriate inputs. The resulting algorithm is computationally very efficient. In general, the value of N is an even number. When N is a power of 2, the proposed algorithm uses only N log N + 2 real multiplications (including the scaling factors in the DCT computation), with none of those being outside the DCT blocks.
This study reports a heuristic genetic algorithm to determine the decoding parameters in a first-order ambisonic system for reconstructing a three-dimensional sound field with an arbitrary quad speaker configuration. ...
详细信息
This study reports a heuristic genetic algorithm to determine the decoding parameters in a first-order ambisonic system for reconstructing a three-dimensional sound field with an arbitrary quad speaker configuration. On this basis, a hardware prototype has been developed using a field programmable gate array (FPGA) to decode ambisonic signals that are encoded in the standard B-format. To allow direct coupling with digital audio sources, the input and output channels of the decoder are implemented with the 12S interface. Evaluations reveal that the decoding parameters derived by this method are superior to existing approaches in terms of flexibility in loudspeaker configuration and optimisation of some of the essential factors in surround sound reconstruction.
In this letter, we propose a frequency and detector pruning approach for reducing the computational complexity associated with loudness estimation. The frequency pruning approach exploits the principles of psychoacous...
详细信息
In this letter, we propose a frequency and detector pruning approach for reducing the computational complexity associated with loudness estimation. The frequency pruning approach exploits the principles of psychoacoustics such that the total neural activity is preserved. The detector pruning approach evaluates the excitation/loudness patterns at nonuniform sample locations and employs signal interpolation techniques to obtain their corresponding high resolution estimates. Comparative results with the Moore and Glasberg loudness estimation process reveal that the proposed pruning approach for loudness estimation performs consistently well for different types of audio signals with a significant reduction in the computational complexity.
In present communication system, high quality audio signal is supposed to be provided with low bit rate and low computational complexity. This paper proposed a novel audio coding bandwidth extension method, which can ...
详细信息
ISBN:
(纸本)9780769538884
In present communication system, high quality audio signal is supposed to be provided with low bit rate and low computational complexity. This paper proposed a novel audio coding bandwidth extension method, which can improve decoded audio quality with increasing only a few coding bits per frame and a little computational complexity. This method calculate high-frequency synthesis filter by using codebook mapping method, and transmit only quantified gain corrections in high-frequency part of multiplexing coding bit stream. The preliminary test show that this method can provide comparable audio quality with lower bit consumption and computational complexity compared to the high frequency regeneration of AMR-WB+.
In this paper, the compressed sensing (CS) methodology is applied to the harmonic part of sinusoidally-modeled audio signals. As this part of the model is sparse by definition in the frequency domain, we investigate h...
详细信息
ISBN:
(纸本)9781424442904
In this paper, the compressed sensing (CS) methodology is applied to the harmonic part of sinusoidally-modeled audio signals. As this part of the model is sparse by definition in the frequency domain, we investigate how CS can be used to encode this signal at low bitrates, instead of encoding the sinusoidal parameters (amplitude, frequency, phase) as current state-of-the-art methods do. We extend our previous work by considering an improved system model, by comparing our model to other schemes, and exploring the effect of incorrectly reconstructed frames. We show that encouraging results can be obtained by our approach, although inferior at this point compared to state-of-the-art. Good performance is obtained using 24 bits per sinusoid as indicated by our listening tests.
We consider the rate allocation problem for multiple-description quantization of the signal described by an adaptive model with a fixed structure. The source modeling in coding generally results in a two-stage descrip...
详细信息
ISBN:
(纸本)9781424423538
We consider the rate allocation problem for multiple-description quantization of the signal described by an adaptive model with a fixed structure. The source modeling in coding generally results in a two-stage description of the data, where one of the stages describes the model parameters, and the other describes the signal. Such a setup implies the existence of a trade-off between the rate spent on the parameters and the rate spent on the signal. We optimize this trade-off analytically for the multiple-description case using a method inspired by Minimum Description Length principle. We also provide an algorithm for optimizing the rate allocation between the components of the model-based multiple description coder. Finally we experimentally confirm our results. Our method facilitates the rate-adaptive multiple-description coding.
We proposed a new frequency domain BandWidth Extension (BWE) technology. In the new technology, FFT based frequency domain gain shaping combined with Linear Prediction coding (LPC) based spectral envelope shaping is u...
详细信息
ISBN:
(纸本)9781424423538
We proposed a new frequency domain BandWidth Extension (BWE) technology. In the new technology, FFT based frequency domain gain shaping combined with Linear Prediction coding (LPC) based spectral envelope shaping is used for generating high frequency signals. To preserve the amount of noise component in the reconstructed band, gain reduction controlled by Spectrum Flatness Measurement (SFM) is employed. Subjective testing results show that the presented technology exhibits a comparable performance compared to 3GPP AMR-WB+ with the same bit-rate in the framework of audio Video coding of China Standard (AVS) Part 10 - Mobile Speech and audio Codec. This technology has been formally adopted as the artificial high band coding module in AVS P10.
暂无评论