Compression algorithms have a constant tradeoff between higher compression ratios at the cost of better quality. The number of bits assigned in the standard MPEG encoders is controlled by the signal to masking thresho...
详细信息
Compression algorithms have a constant tradeoff between higher compression ratios at the cost of better quality. The number of bits assigned in the standard MPEG encoders is controlled by the signal to masking thresholds and the scalefactor calculations performed in the psychoacoustic model of the algorithm. The developed algorithm assigns lower bits to audio samples without significant degradation in quality.
Note onsets mark the beginning of attack transients, short areas of a note containing rapid changes of the signal spectral content. Detecting onsets is not trivial, especially when analysing complex mixtures. Applicat...
详细信息
Note onsets mark the beginning of attack transients, short areas of a note containing rapid changes of the signal spectral content. Detecting onsets is not trivial, especially when analysing complex mixtures. Applications for note onset detection systems include time stretching, audio coding and synthesis. An alternative to standard energy-based onset detection is proposed by using phase information. It is suggested that by observing the frame-by-frame distribution of differential angles, the precise moment when onsets occur can be detected with accuracy. Statistical measures are used to build the detection function. The system is tested and tuned on a database of complex recordings.
A scalably compressed bitstream is one which can be streamed and decoded at a wide variety of bitrates, and it is therefore compatible with communications channels of varying capacity. The audio coding portions of the...
详细信息
A scalably compressed bitstream is one which can be streamed and decoded at a wide variety of bitrates, and it is therefore compatible with communications channels of varying capacity. The audio coding portions of the MPEG 2 and 4 standards support fine-grained scalability through the use of bit slice arithmetic coding (BSAC). Human subjective analysis of BSAC, however, has shown that it performs poorly at low bitrates; seemingly random tonal patterns are superimposed on the actual audio. Here, we develop a new approach for objectively characterizing such distortion and validate it with human subjective trials. Unlike most other objective performance metrics, the proposed approach does not require sample-accurate sequence synchronization. As a comparison, we also apply the ITU-R BS.1387-1 objective testing recommendation to the same audio sequences and quantify how well it predicts the observed subjective quality.
At ISMIR 2002 and CBMI 2001, the authors presented a new approach to audio fingerprinting (Haitsma, J. and Kalker, T., Proc. Int. Conf. on Music Information Retrieval, p.107-15, 2002; Haitsma et al., Proc. Int. Worksh...
详细信息
At ISMIR 2002 and CBMI 2001, the authors presented a new approach to audio fingerprinting (Haitsma, J. and Kalker, T., Proc. Int. Conf. on Music Information Retrieval, p.107-15, 2002; Haitsma et al., Proc. Int. Workshop on Content-Based Multimedia Indexing, p.117-25, 2001). The proposed scheme, which we refer to as the streaming audio fingerprinting (SAF) system, allows a very efficient database lookup and is also very robust against many different audio processing steps, including low bit rate audio coding, noise addition and amplitude compression. However it is not inherently robust against large linear speed changes (i.e. speed changes larger than 2%) where both the pitch and the tempo change. This is a potential problem, because some radio stations speed up by a few percent. We discuss a modification of the originally proposed fingerprinting algorithm to make it robust against large linear speed changes. The proposed modification has negligible effect on other aspects, such as robustness and reliability.
In this paper, a scalable audio scheme is presented, which is mainly based on an embedded zerotree wavelet (EZW) coding technology. Firstly, 29 critical subbands are obtained by splitting audio signals with a digital ...
详细信息
ISBN:
(纸本)7563506861
In this paper, a scalable audio scheme is presented, which is mainly based on an embedded zerotree wavelet (EZW) coding technology. Firstly, 29 critical subbands are obtained by splitting audio signals with a digital wavelet package transform (DPWT). Then a zerotree coding is acted on these subbands. Lastly, an entropy coding is applied to remove redundancy and a specific frame structure is formed. The resulting encoder can support a scalable bit stream from 16 kbps to 64 kbps with a 4 kbps step size for a single audio channel and the graceful degeneration of subjective audio quality can also be provided.
We investigate the use of nonuniform cosine-modulated filter banks for audio coding. A rate-distortion framework is employed, similar to the work in Herley et al. (1994), to select the filter bank structure from a lar...
详细信息
We investigate the use of nonuniform cosine-modulated filter banks for audio coding. A rate-distortion framework is employed, similar to the work in Herley et al. (1994), to select the filter bank structure from a large library of possible frequency decompositions. A new flexible frequency decomposition algorithm is proposed that jointly optimizes the filter bank structure and the bit allocation over the subband channels. Experimental results for both synthetic and real audio signals are provided. The new algorithm shows significant improvements in comparison with fixed uniform frequency decompositions, but special care has to be taken to reduce the size of the decomposition overhead.
We present a novel concept for representing multi-channel audio signals: Binaural Cue coding (BCC). BCC aims at separating the basic audio content and the information relevant for spatial perception. A multi-channel a...
详细信息
ISBN:
(纸本)0780374029
We present a novel concept for representing multi-channel audio signals: Binaural Cue coding (BCC). BCC aims at separating the basic audio content and the information relevant for spatial perception. A multi-channel audio signal is represented as a mono signal and BCC parameters. We present two types of applications of BCC. Firstly, a number of separate sound source signals are reduced to a mono signal and BCC parameters. In this case, the decoder has control over the location of each source in auditory space. In other words, the decoder can render spatial images as if the separate source signals were given. Secondly, a multi-channel audio signal is reduced to a mono signal and BCC parameters. In this case the decoder generates a multi-channel signal with a spatial image similar to the spatial image of the input signal of the encoder. Results from a subjective test suggest that BCC, combined with existing mono audio coders, offers better quality than conventional stereo and multi-channel perceptual transform audio coders for a wide range of bitrates.
The time-frequency tiling, bit allocation and the quantizer of most perceptual coding algorithms is either fixed or controlled by a perceptual model. The large variety of existing audio signals, each exhibiting differ...
详细信息
ISBN:
(纸本)0819436828
The time-frequency tiling, bit allocation and the quantizer of most perceptual coding algorithms is either fixed or controlled by a perceptual model. The large variety of existing audio signals, each exhibiting different coding requirements due to their different temporal and spectral fine-structure suggests to use a signal-adaptive algorithm. The framework which is described in this paper makes use of a signal-adaptive wavelet filterbank which allows to switch any node of the wavelet-packet tree individually. Therefore each subband can have an individual time-segmentation and the overall time-frequency tiling can be adapted to the signal using optimization techniques. A rate-distortion optimality can be defined which will minimize the distortion for a given rate in every subband, based on a perceptual model. Due to the additivity of the rate and distortion measure over disjoint covers of the input signal, an overall cost function including the switching cost for the filterbank switching can be defined. By the use of dynamic programming techniques, the wavelet-packet tree can be pruned based on a top-down or bottom-up "split-merge" decision in every node of the wavelet-tree. Additionally we can profit from temporal masking due to the fact that each subband can have an individual segmentation in time without introducing time domain artifacts such as pre-echo distortion.
A novel concept for perceptual audio coding is presented which is based on the combination of a pre- and post-filter, controlled by a psychoacoustic model, with a transform coding scheme. This paradigm allows modeling...
详细信息
ISBN:
(纸本)0780362934
A novel concept for perceptual audio coding is presented which is based on the combination of a pre- and post-filter, controlled by a psychoacoustic model, with a transform coding scheme. This paradigm allows modeling of the temporal and spectral shape of the masked threshold with a resolution independent of the used transform. By using frequency warping techniques the maximum possible detail for a given filter order can be made frequency-dependent and thus better adapted to the human auditory system. The filter coefficients are represented efficiently by LSF parameters which can be adaptively interpolated over time. First experiments with a system obtained by extending an existing transform codec showed that this approach can significantly improve the performance for speech signals, while the performance for other signals remained the same.
In audio communication over a lossy packet network, concealment techniques are used to mitigate the effects of lost packets. This concealment is markedly improved if the compressed representation retains redundancy to...
详细信息
In audio communication over a lossy packet network, concealment techniques are used to mitigate the effects of lost packets. This concealment is markedly improved if the compressed representation retains redundancy to aid in the estimation of lost information. A perceptual audio coder employing multiple description correlating transforms demonstrates this phenomenon.
暂无评论