Object-based audiocoding can provide new music applications with interactivity. To efficiently compress a lot of target audio objects, a subband-based parametriccoding scheme has been adopted for MPEG spatial audio ...
详细信息
Object-based audiocoding can provide new music applications with interactivity. To efficiently compress a lot of target audio objects, a subband-based parametriccoding scheme has been adopted for MPEG spatial audio object coding. in this letter, the time-frequency (T/F) subband analysis structure is investigated. A reconfigured T/F structure is also proposed to enhance the generating performance of sound scenes such as 'karaoke' and 'solo' play in interactive music scenarios. From the experimental results, it was confirmed that the proposed scheme remarkably improves the SNR and sound quality.
parametric spatial audiocoding schemes, such as advanced joint channel coding in Dolby's next-generation audiocoding system AC-4, achieve a higher data compression ratio as a result of a lower-dimensional interm...
详细信息
ISBN:
(纸本)9781538663189
parametric spatial audiocoding schemes, such as advanced joint channel coding in Dolby's next-generation audiocoding system AC-4, achieve a higher data compression ratio as a result of a lower-dimensional intermediate signal representation, known as the downmix. During the inverse process, the upmix, which is guided by side information, the covariance between the source signals is reconstructed to preserve perceptually important cues such as ambience or source width. In this manuscript, a systematic approach for the construction of ambience bases from weighing matrices is presented. Furthermore, the basis vectors are generalized to accommodate for nonunitary mixing weights, and a new basis is derived. Round figures from internal listening tests are shared to underpin the utility of the approach.
In parametric stereo audiocoding, at the encoder a stereo signal is downmixed to a mono signal along with a set of time-frequency dependent stereo parameters. At the decoder, using a decorrelator, a decorrelated sign...
详细信息
ISBN:
(纸本)9798350361865;9798350361858
In parametric stereo audiocoding, at the encoder a stereo signal is downmixed to a mono signal along with a set of time-frequency dependent stereo parameters. At the decoder, using a decorrelator, a decorrelated signal is first generated from the downmix signal. A replica of the stereo signal is subsequently reconstructed based on the time-frequency dependent stereo parameters, the downmix and the decorrelated signal. A disadvantage of traditional decorrelators is that they have trouble following the temporal envelope of the mono signal due to frequency-dependent delays introduced in their processing. This is especially problematic for signals with strong, short energy bursts like transients, and leads to unwanted smearing of the decorrelated signal. In this work, we introduce a cross-domain deep learning approach for reshaping a decorrelated signal's temporal envelope in the subband domain, making use of envelope features learned from the time-domain downmix.
This paper deals with the application of adaptive signal models for representing transients and sinusoids at the same stage in a parametricaudio coder. To accomplish such a goal, we search for sparse approximations b...
详细信息
This paper deals with the application of adaptive signal models for representing transients and sinusoids at the same stage in a parametricaudio coder. To accomplish such a goal, we search for sparse approximations by means of matching pursuit with a mixed dictionary, instead of using two different dictionaries that operate in cascade. In such sense, complex exponentials and wavelet packets are chosen for modeling the tonal and transient features of an audio signal, respectively. At each iteration of the pursuit, the mixed dictionary function that extracts the most energy from the residue is selected. This function will be either a complex exponential or a wavelet packet, depending on the characteristics of the residue at that iteration. Experimental results clearly show the objective (compression rate) and subjective (% preference) advantages of the mixed dictionary over two cascaded dictionaries. The approach proposed in this paper is successfully applied for parametric audio coding purposes, assuring better perceptual audio quality than MPEG2/4-AAC at 16 Kbits/s for most of the CD-quality one channel audio signals considered for testing. (C) 2005 Elsevier B.V. All rights reserved.
In this paper we propose an improved sinusoidal modeling method based on perceptual matching pursuits computed in the bark scale for parametric audio coding applications. Complex exponentials compose the overcomplete ...
详细信息
In this paper we propose an improved sinusoidal modeling method based on perceptual matching pursuits computed in the bark scale for parametric audio coding applications. Complex exponentials compose the overcomplete dictionary for matching pursuits. The main contribution is the minimization of a perceptual distortion measure defined in the bark scale to select the optimum atom at each iteration of the pursuits. Furthermore, a psychoacoustic stopping criterion for the pursuits is presented. The proposed sinusoidal modeling method is suitable to be integrated into a parametricaudio coder based on the three-part model of sines, transients and noise (STN model), as can be appreciated in experimental results. Our method provides significant advantages regarding previous works mainly because it operates in the bark scale rather than in frequency domain. (C) 2008 Elsevier Inc. All rights reserved.
In this letter, we propose joint quantization of the parameters of a set of sinusoids based on the theory of trellis-coded quantization. A particular advantage of this approach is that it allows for joint quantization...
详细信息
In this letter, we propose joint quantization of the parameters of a set of sinusoids based on the theory of trellis-coded quantization. A particular advantage of this approach is that it allows for joint quantization of a variable number of sinusoids, which is particularly relevant in variable rate parametric audio coding. Under high-resolution assumptions and based on a perceptually relevant distortion measure, we derive analytical expressions for the optimal design subject to an entropy constraint. Numerical experiments show a significant performance gain compared to optimal spherical quantization at the cost of a slight increase in computational complexity.
This paper describes the Molecular Matching Pursuit (MMP), an extension of the popular Matching Pursuit (MP) algorithm for the decomposition of signals. The MMP is a practical solution which introduces the notion of s...
详细信息
This paper describes the Molecular Matching Pursuit (MMP), an extension of the popular Matching Pursuit (MP) algorithm for the decomposition of signals. The MMP is a practical solution which introduces the notion of structures within the framework of sparse overcomplete representations;these structures are based on the local dependency of significant time-frequency or time-scale atoms. We show that this algorithm is well adapted to the representation of real signals such-as percussive audio signals. This is at the cost of a slight sub-optimality in terms of the rate of convergence for the approximation error, but the benefits are numerous, most notably a significant reduction in the computational cost, which facilitates the processing of long signals. Results show that this algorithm is very promising for high-quality adaptive coding of audio signals.
In most parametric stereo audio coders, sets of spatial parameters are extracted from the audio channels in a time-frequency domain. In order to reduce the amount of data, the parameters plane is highly down-sampled, ...
详细信息
ISBN:
(纸本)9781467300469
In most parametric stereo audio coders, sets of spatial parameters are extracted from the audio channels in a time-frequency domain. In order to reduce the amount of data, the parameters plane is highly down-sampled, and transmitted together with a mono downmix. Then, in the decoding process, it is necessary to interpolate the upmix matrix computed from these parameters. Usually, this is done in the same way for each portion of signal, regardless of its nature. In this article, we propose a dynamic strategy of window splitting, estimation of the parameters and interpolation of the upmix matrix based on transient detection in the audio signal. Subjective tests show an improvement when applied to the new stereo parametric tool from MPEG USAC.
Transform-based audio coders are the preferred technique for music data compression. However, at low bitrates, traditional coders based on Modified Discrete Cosine Transform are prone to strong warbling and roughness ...
详细信息
ISBN:
(纸本)9781479928934
Transform-based audio coders are the preferred technique for music data compression. However, at low bitrates, traditional coders based on Modified Discrete Cosine Transform are prone to strong warbling and roughness artifacts originating from sparsely coded tonal components. parametric coders, in turn, suffer from an unpleasantly artificial sound and do not scale well up to perceptual transparency. Hybrid transform-based and parametriccoding could potentially overcome the limits of the individual approaches. Yet, existing hybrid coders are hampered by the lack of integrative interplay between both techniques. We outline our ideas how to tightly integrate transform-based coding and parametriccoding to obtain an enhanced perceptual quality and scalability. Also, we provide listening test results which demonstrate the benefits of our hybrid coder design.
Sinusoidal models are widely used in parametric speech and audiocoding schemes. A common requirement in these applications is to select only a subset of components that provide the greatest perceptual benefit particu...
详细信息
ISBN:
(纸本)9781479903566
Sinusoidal models are widely used in parametric speech and audiocoding schemes. A common requirement in these applications is to select only a subset of components that provide the greatest perceptual benefit particularly at low bitrates. Usually, perceptual sinusoidal component selection algorithms make use of greedy algorithms that are computationally expensive. In this paper, we present a new algorithm that selects sinusoidal components based on the partial loudness model proposed by Moore & Glasberg. We compare the performance of the proposed algorithm in terms of perceptual benefit and computational complexity to other existing sinusoidal selection algorithms.
暂无评论