Sinusoidal coding plays an important in low rate audio coding. Typically, differential techniques are employed to reduce the bit rate for representing the sinusoidal parameters. In this paper we compare several scheme...
详细信息
ISBN:
(纸本)0780378504
Sinusoidal coding plays an important in low rate audio coding. Typically, differential techniques are employed to reduce the bit rate for representing the sinusoidal parameters. In this paper we compare several schemes for time-differential (TD) and frequency-differential (FD) representation of the sinusoidal model parameters. We show through simulation experiments that bit rates obtained with FD techniques are competitive with those achieved using TD techniques, provided that a variable length segmentation algorithm is applied to the input signal first. This result is important because FD techniques (in contrast to TD techniques) do not rely on the presence of previous segments for correct decoding, and therefore offer robust performance in a lossy packet channel environment.
In this paper we propose a new approach based on energy-adaptive matching pursuits to improve sinusoidal modelling for parametric speech and audio coding. To reduce the complexity of the algorithm, an over-complete di...
详细信息
ISBN:
(纸本)953184061X
In this paper we propose a new approach based on energy-adaptive matching pursuits to improve sinusoidal modelling for parametric speech and audio coding. To reduce the complexity of the algorithm, an over-complete dictionary composed of complex exponentials is used. An analysis-synthesis windows scheme that avoids overlapping is proposed. For efficient quantization of sinusoidal model parameters, a new algorithm that largely reduces the side information required by the decoder is described. Experimental results show evidence of the advantages of the proposed method to be integrated into multi-parts models for parametric speech/audio coding.
A novel spatial audio coding technique that allows individualized 3D audio presentation is described. It exploits the dichotomous roles of the low-frequency interaural timing and level difference cues versus the high-...
详细信息
ISBN:
(纸本)0780378504
A novel spatial audio coding technique that allows individualized 3D audio presentation is described. It exploits the dichotomous roles of the low-frequency interaural timing and level difference cues versus the high-frequency spectral cues in human sound localization. The high-frequency spectral cues are modified to match the acoustics of the listener's outer ears, while preserving the original low-frequency interaural cues. The theory behind the coding technique is described and sound localization data are shown demonstrating the fidelity of the coding technique. Issues relating to the frequency resolution of directional frequency bands are explored.
A novel approach using non-negative matrix factorization (NMF) for onset detection of musical notes from audio signals is presented. Unlike most commonly used conventional approaches, the proposed method exploits a ne...
详细信息
A novel approach using non-negative matrix factorization (NMF) for onset detection of musical notes from audio signals is presented. Unlike most commonly used conventional approaches, the proposed method exploits a new detection function constructed from the linear temporal bases that are obtained from a non-negative matrix decomposition of musical spectra. Both first-order difference and psychoacoustically motivated relative difference functions of the temporal profile are considered. As the approach works directly on input data, no prior knowledge or statistical information is thereby required. A practical issue of the choice of the factorization rank is also examined experimentally. Numerical examples are provided to show the performance of the proposed method.
In this paper, we proposed a low complexity architecture design for psycho-acoustic model (PAM). PAM is key component of MPEG-2/4 advanced audio coding (AAC) encoder. It occupies heavy computation load in AAC encoder ...
详细信息
In this paper, we proposed a low complexity architecture design for psycho-acoustic model (PAM). PAM is key component of MPEG-2/4 advanced audio coding (AAC) encoder. It occupies heavy computation load in AAC encoder and makes the AAC encoder hard to be implemented on portable devices for real-time condition. In order to conquer questions describe above, we propose a MDCT-based PAM algorithm and its dedicated architecture to accelerate PAM calculation. The main advantage of MDCT-based PAM is the filterbanks in AAC can be reduced form three to two. Furthermore, we use look-up table method to replace computation of spreading-function. Second, the logarithmic number system (LNS) is used to reduce computation load of many special functions and data word-length. In the hardware architecture design, the pipeline is used to increase throughput of the PAM. Besides, a logarithmic unit that converts data into log scale at one cycle uses in our design. The proposed PAM architecture is implemented in UMC 0.18 CMOS technology. The total gate count is 69476
We recently proposed a multichannel audio coding method using a multiband source/filter model, which results in a compact representation of the original recording. Our method can reproduce the original recording using...
详细信息
We recently proposed a multichannel audio coding method using a multiband source/filter model, which results in a compact representation of the original recording. Our method can reproduce the original recording using only one audio channel and side information for the remaining channels in the order of 5 KBps/channel. Here, we examine packet loss concealment strategies for use within our model, so that we can derive a complete system for low-bitrate multichannel audio streaming through the Internet or wireless channels.
Sinusoidal coding is an essential tool in low-rate audio coding, and developing an efficient quantization scheme for the sinusoidal parameters is therefore crucial. In this work we derive optimal entropy constrained a...
详细信息
Sinusoidal coding is an essential tool in low-rate audio coding, and developing an efficient quantization scheme for the sinusoidal parameters is therefore crucial. In this work we derive optimal entropy constrained amplitude, phase and frequency quantizers for sinusoids whose frequencies are harmonically related, with respect to the l 2 distortion measure. This scheme exploits the harmonic structure of many speech and audio signals in the sense that besides amplitudes and phases, only fundamental frequencies need to be quantized, resulting in a significant decrease in the number of bits assigned to frequency parameters. The asymptotically optimal quantizers minimize a high-resolution approximation of the expected l 2 distortion while the corresponding quantization indices satisfy an entropy constraint. The quantizers turn out to be flexible and of low complexity, in the sense that they can be determined easily for varying bit rate requirements, without any sort of retraining or iterative procedures. In an objective rate-distortion comparison, the proposed scheme is shown to outperform two variants of a recently proposed scheme, in which all frequency parameters are quantized separately, either directly or differentially
For the purpose of improving the coding efficiency, this paper attempts to combine psychoacoustic model for perceptual evaluation of audio quality in BS.1387 with perceptual audio coder. The principle of this new psyc...
详细信息
For the purpose of improving the coding efficiency, this paper attempts to combine psychoacoustic model for perceptual evaluation of audio quality in BS.1387 with perceptual audio coder. The principle of this new psychoacoustic model is analyzed in theory, and corresponding improvements are proposed to make it be effectively applied to actual audio coder. Both this new model and MPEG psychoacoustic model 2 are implemented in the latest AVS reference coder of China, and comparison of output masking parameter and subjective hearing test between the two models are conducted. Experimental results show that the proposed psychoacoustic model is feasible
In this paper, a signal-adaptive, stereo-to-mono downmixing scheme associated with MPEG-4 parametric stereo (PS) encoding is presented. The proposed scheme minimizes signal cancellation and coloration due to inter-cha...
详细信息
In this paper, a signal-adaptive, stereo-to-mono downmixing scheme associated with MPEG-4 parametric stereo (PS) encoding is presented. The proposed scheme minimizes signal cancellation and coloration due to inter-channel phase misalignment. By using the inter-channel phase difference information which is calculated as a PS spatial parameter, the phase of the stereo signals are aligned. Subsequently, a simple averaging is carried out to mix the signals. The need to perform power equalization to preserve the overall power of the stereo signals in the downmix signal is eliminated. This leads to a significant saving of computational power. The scheme allows overall phase difference (OPD), one of the spatial parameter sent in PS bitstream, to be coded with minimum bit consumption. As a result, the entropy is reduced from 1.31 bits/symbol to 1 bit/symbol. This scheme is useful especially for stereo audio with a significant amount of side signal component
暂无评论