audio streaming applications have become very popular in recent years, owing to their low cost and convenience. However, during network congestions, data packets are often delayed or discarded, creating an annoying ga...
详细信息
audio streaming applications have become very popular in recent years, owing to their low cost and convenience. However, during network congestions, data packets are often delayed or discarded, creating an annoying gap in the streamed media. This letter presents a new approach to audio packet loss concealment designed for MPEG-audio streaming applications. In a previous work, we introduced a receiver-based concealment algorithm based on applying the gapped-data amplitude and phase estimation (GAPES) interpolation algorithm in the discrete short-time Fourier transform (DSTFT) complex domain and obtained better results compared to past methods. The current approach applies the same algorithm on a different complex domain, formed from combining the modified discrete cosine transform (MDCT) domain as its real part and the modified discrete sine transform (MDST) domain as the imaginary part. The new approach significantly reduces the complexity demands while maintaining similar high-quality results.
During the last decade, new mobile multimedia applications have emerged for mobile and network multimedia, wireless multimedia communication, audio/video teleconferencing, remote assistance, digital storage systems, s...
详细信息
ISBN:
(纸本)9781424413119
During the last decade, new mobile multimedia applications have emerged for mobile and network multimedia, wireless multimedia communication, audio/video teleconferencing, remote assistance, digital storage systems, secure audio transmission and so on. In order to meet these requirements, tremendous research efforts have been put in the development of efficient digital audio coding technologies. In China, AVS-M audio standard is such an audio technology targeting for mobile multimedia applications which is developed and owned by China audio and Video coding Standard Workgroup. In this paper, AVS-M audio standard is discussed by revealing the technical principles of the en- and decoding, the standardization situation and the suitability of the codec in relation to technology available, economical feasibility and the market needs. Finally it concludes with a brief discussion of future research directions.
We extend the recently proposed spectral integration based psychoacoustic model for sinusoidal distortions to the MDCT domain. The estimated masking threshold additionally depends on the sub-band spectral flatness mea...
详细信息
ISBN:
(纸本)9781424418343
We extend the recently proposed spectral integration based psychoacoustic model for sinusoidal distortions to the MDCT domain. The estimated masking threshold additionally depends on the sub-band spectral flatness measure of the signal which accounts for the non-sinusoidal distortion introduced by masking. The expressions for masking threshold are derived and the validity of the proposed model is established through perceptual transparency test of audio clips. Test results indicate that we do achieve transparent quality reconstruction with the new model. Performance of the model is compared with MPEG psychoacoustic models with respect to the estimated perceptual entropy (PE). The results show that the proposed model predicts a lower PE than other models.
Linear prediction (LP) is a valuable tool for speech analysis and coding, due to the efficiency of the autoregressive model for speech signals. In audio analysis and coding, the sinusoidal model is much more popular, ...
详细信息
ISBN:
(纸本)9781605603162
Linear prediction (LP) is a valuable tool for speech analysis and coding, due to the efficiency of the autoregressive model for speech signals. In audio analysis and coding, the sinusoidal model is much more popular, which is partly due to the poor performance of audio LP. By examining audio LP from a spectral estimation point of view, we observe that the distribution of the audio signal's dominant frequencies in the Nyquist interval is a critical factor determining LP performance. In this framework, we describe five existing alternative LP methods and illustrate how they all attempt to solve the observed frequency distribution problem.
In this paper, we present a 6.8-32 kbit/s scalable speech and audio coder using a modified-discrete-cosine-transform (MDCT)-based bandwidth extension on top of a 6.8 kbit/s code-excited-linear-prediction (CELP) coder....
详细信息
ISBN:
(纸本)9781605603162
In this paper, we present a 6.8-32 kbit/s scalable speech and audio coder using a modified-discrete-cosine-transform (MDCT)-based bandwidth extension on top of a 6.8 kbit/s code-excited-linear-prediction (CELP) coder. The proposed coder comprises a 6.8 kbit/s narrowband CELP as its core-layer and eight enhancement layers with the bitrates of 0.8, 1.2, 3.2, or 4.0 kbit/s. After encoding of a narrowband signal by the core-layer, the first enhancement layer extends the bandwidth of a narrowband decoded signal, and the other enhancement layers increase the fidelity of an extended wideband signal or robustness against frame erasure conditions. Subjective evaluation test results demonstrate that the proposed coder outperforms G.729.1 for music signals at 16 and 24 kbit/s in particular with competitive or even better performance in other conditions like clean speech, background noise, and frame erasure.
This paper proposes a simple implementation of audio virtual surround sound effect and a novel scheme for multichannel coding using virtual prediction technique. We used the data of Head Related Transfer Functions (HR...
详细信息
ISBN:
(纸本)0819453676
This paper proposes a simple implementation of audio virtual surround sound effect and a novel scheme for multichannel coding using virtual prediction technique. We used the data of Head Related Transfer Functions (HRTFs) to produce virtual surround sound channels, and mixed them into original stereo channels. The resultant effects were passed the subjective evaluation and implemented real-time on a Motorola DSP. Using the virtual prediction technique, we can remove redundancies between inter-channels in multichannel coding. Therefore, a new coding scheme and method thereof are given. It is helpful to decrease the bit-rates or enhance the quality of multichannel audio coding. The feasibility and result are discussed in the end of this paper.
Factorial pulse coding, a method which is known to efficiently code an information signal using unit magnitude pulses, involves computation of combinatorial functions. These computations are highly complex as they req...
详细信息
Factorial pulse coding, a method which is known to efficiently code an information signal using unit magnitude pulses, involves computation of combinatorial functions. These computations are highly complex as they require many multiply and divide operations on multi-precision numbers, especially when the length of a signal is large or many unit magnitude pulses are used for coding. In this paper, we propose a very low complexity method for approximation of these combinatorial functions. The approximate functions satisfy a property which preserves unique decode-ability of the factorial packing encoding/decoding algorithm. The low complexity computation enables use of factorial packing in encoding/decoding of 144 MDCT coefficients using 28 unit magnitude pulses for the audio coding mode of the EVRC-WB speech coding standard without affecting the number of bits required for coding.
A generative model of a human voice is presented, based on many pseudo-physical considerations. For robustness, observation noise is also included in the model. An EM-algorithm framework for inference and learning is ...
详细信息
A generative model of a human voice is presented, based on many pseudo-physical considerations. For robustness, observation noise is also included in the model. An EM-algorithm framework for inference and learning is then described. An instance of approximate inference and subsequent learning presented allows an extraction of voice parameter which can be used for structured coding application. This set of parameters allows a great amount of compression as well as the flexibility in making modification to pitch, duration and breathiness, noise-free synthesis compared to other non-parametric approaches.
Parametric Spatial audio coding (PSAC) is a promising method which compresses multi-channel signals to extremely compact backward compatible representations. However, some implementations (e.g. BCC) usually have noise...
详细信息
Parametric Spatial audio coding (PSAC) is a promising method which compresses multi-channel signals to extremely compact backward compatible representations. However, some implementations (e.g. BCC) usually have noise on their output audio signals and thus degrade their listening qualities. In this paper, a low complexity quality enhancement method is presented.
A new low bitrate audio coding technology (called "ExAC") based on enhanced audio coding (EAC) and spectral band replication (SBR) is introduced. The major building blocks of the coding schemes are explained...
详细信息
A new low bitrate audio coding technology (called "ExAC") based on enhanced audio coding (EAC) and spectral band replication (SBR) is introduced. The major building blocks of the coding schemes are explained, in which EAC works as a core coder and SBR works as a powerful bandwidth extension module. The new coding technology provides a high quality audio compression scheme for a broad range of applications, including the high-density laser video diskette, HDTV and very low bitrate applications such as AM audio broadcasting and streaming.
暂无评论