Sinusoidal coding plays an important role in low-rate audio coding. Typically, (time/frequency) differential techniques are employed to reduce the bit rate for representing the sinusoidal components. In this paper we ...
详细信息
Sinusoidal coding plays an important role in low-rate audio coding. Typically, (time/frequency) differential techniques are employed to reduce the bit rate for representing the sinusoidal components. In this paper we derive optimal entropy-constrained differential quantisers for quantising the sinusoid parameters. More specifically, the quantisers minimise a perceptually relevant distortion measure while the corresponding quantisation indices satisfy an entropy constraint. The quantisers turn out to be flexible and of low complexity. Subjective evaluations with audio signals suggest a bit-rate reduction as high as 20% with the derived quantisers over state-of-the-art (logarithmic) quantisers.
The recently introduced MPEG standard for lossless audio coding, MPEG-4 audio Scalable to Lossless (SLS) coding technology, provides a universal audio format that integrates the functionalities of lossy audio coding, ...
详细信息
The recently introduced MPEG standard for lossless audio coding, MPEG-4 audio Scalable to Lossless (SLS) coding technology, provides a universal audio format that integrates the functionalities of lossy audio coding, lossless audio coding and fine granular scalable audio coding in a single framework. We propose two coding methods that improve the coding efficiency of SLS, namely, a context-based arithmetic code (CBAC) method and a low energy mode code method. These two coding methods work harmonically with the current SLS framework and preserve all its desirable features, such as fine granular scalability, while successfully improving its lossless compression ratio performance.
In this paper a new predictive lossless coding scheme is proposed. The prediction is based on a cascaded peak to valley linear prediction method (PVLP). This method is based on simple linear prediction between the det...
详细信息
In this paper a new predictive lossless coding scheme is proposed. The prediction is based on a cascaded peak to valley linear prediction method (PVLP). This method is based on simple linear prediction between the detected feature points. Experimental results on different types of music and songs show a new competitive compression ratio compared to the other algorithms of the lossless audio compression.
Ubiquitous streaming of rich media has long been one of the most difficult challenges, and at the same time it has invoked the most rewarding killer applications. With the increasing bandwidth available to users, expa...
详细信息
Ubiquitous streaming of rich media has long been one of the most difficult challenges, and at the same time it has invoked the most rewarding killer applications. With the increasing bandwidth available to users, expanding pervasiveness of multimedia-ready devices, and growth in rich media content, the dream of streaming rich media is coming closer to reality. However, interoperability is still one of the important remaining challenges. The Internet Streaming Media Alliance (ISMA) is working toward the goal of interoperability of streaming rich media (video, audio, and data) over Internet protocol (IP) networks by developing open streaming standards. Some of ISMA's interoperability testing work takes the form of plugfests that provide intense interactions and exchange of media streams among tools and systems. This article describes how ISMA addresses interoperability testing and conformance, working toward the vision of seamless interworking streaming media devices.
MPEG's most recent effort to progress the state of the art is the MPEG Surround work item. It provides an efficient method for coding multichannel sound via the transmission of a compressed stereophonic (or even m...
详细信息
MPEG's most recent effort to progress the state of the art is the MPEG Surround work item. It provides an efficient method for coding multichannel sound via the transmission of a compressed stereophonic (or even monophonic) audio program plus a low-rate side-information channel. Benefits of this approach include backward compatibility with pervasive stereo playback systems while permitting next-generation players to reconstruct high-quality multichannel sound.
A multi-mode harmonic transform coding (MMHTC) for speech and music signals is proposed. Its structure is organized as a linear prediction model with an input of harmonic and transform-based excitation. The proposed c...
详细信息
ISBN:
(纸本)9806560477
A multi-mode harmonic transform coding (MMHTC) for speech and music signals is proposed. Its structure is organized as a linear prediction model with an input of harmonic and transform-based excitation. The proposed coder also utilizes harmonic prediction and an improved quantizer of excitation signal. To efficiently quantize the excitation of music signals, the modulated lapped transform (MLT) is introduced. In other words, the coder combines both the time domain (linear prediction) and the frequency domain technique to achieve the best perceptual quality The proposed coder showed better speech quality than that of the 8 kbps QCELP coder at a bit-rate of 4 kbps.
A framework for flexible and efficient coding of general stereo audio signals is proposed. Methods based on the framework can be used together with an arbitrary single channel (mono) coder to achieve seamless transiti...
详细信息
ISBN:
(纸本)0780391543
A framework for flexible and efficient coding of general stereo audio signals is proposed. Methods based on the framework can be used together with an arbitrary single channel (mono) coder to achieve seamless transition from pure parametric stereo coding to waveform approximating coding as the bitrate is increased. The idea, based on sum-difference encoding of time-aligned signal components, is presented as a general framework. An example implementation is demonstrated to have the desired convergence properties towards transparent quality.
This paper presents two fundamental enhancements in a hybrid audio signal model consisting of sinusoidal, transient, and noise (STN) components. The first enhancement involves a novel application of a perceptual metri...
详细信息
This paper presents two fundamental enhancements in a hybrid audio signal model consisting of sinusoidal, transient, and noise (STN) components. The first enhancement involves a novel application of a perceptual metric for optimal time segmentation for the analysis of transients. In particular, Moore and Glasberg's model of partial loudness is modified for use with general signals and then integrated into a novel time segmentation scheme. The second, and perhaps more significant STN enhancement is concerned with a new methodology for ranking and selection of the most perceptually relevant sinusoids. A systematic procedure is developed for the selection of a compact set of sinusoids and comparative results are given to demonstrate the merit of this method.
Here, we propose speech-coding procedures achieving high subjective quality, avoiding speech-specific processing and interframe exploitation. Thus, the scheme is tractable for packet-based voice communication, and has...
详细信息
Here, we propose speech-coding procedures achieving high subjective quality, avoiding speech-specific processing and interframe exploitation. Thus, the scheme is tractable for packet-based voice communication, and has the capability of coding generic audio. The architecture is based on an modified discrete cosine transform (MDCT) representation of the signal, and combines efficient vector quantization (VQ) techniques with psychoacoustic principles. Weighted quantization of MDCT coefficients is performed, using a codebook based on a statistical model of the multidimensional NEXT pdf. The weighting and the codebook are adapted for each frame to account for masking thresholds given by a psychoacoustic analysis. Actual quantization is performed using lattices, thereby, achieving close to rate independent complexity. The result is a coding scheme operational at a range of rates. Here, a particular instance at 16 kbits/s, using a sampling frequency of 8 kHz, is shown to perform better than an LD-CELP operating at the same rate, even though no interframe memory is exploited.
This paper presents the derivation of a new perceptual model that represents speech and audio signals by a sum of exponentially damped sinusoids. Compared to a traditional sinusoidal model, the exponential sinusoidal ...
详细信息
This paper presents the derivation of a new perceptual model that represents speech and audio signals by a sum of exponentially damped sinusoids. Compared to a traditional sinusoidal model, the exponential sinusoidal model (ESM) is better suited to model transient segments that are readily found in audio signals. Total least squares (TLS) algorithms are applied for the automatic extraction of the modeling parameters in the ESM, i.e. the amplitude, phase, frequency and damping factors of a user-defined number of damped sinusoids. In order to turn the SNR optimization criterion of these TLS algorithms into a perceptual modeling strategy, we use the psychoacoustic model of MPEG-1 Layer 1 in a subband TLS-ESM scheme. This allows us to model each subband signal in accordance with its perceptual relevance, thereby lowering the number of required modeling components for a given modeling quality. Simulations and listening tests confirm that perceptual ESM achieves the same perceived quality as plain ESM while using substantially less components, and provide support for applying the new model in the fields of parametric audio processing and coding. (C) 2004 Elsevier B.V. All rights reserved.
暂无评论