In this paper, we proposed a method of coding and transmitting 3D multichannel sound, which transmits eight- channel signals for family-style reproduction and signals representing the spatial difference between the or...
详细信息
In this paper, we proposed a method of coding and transmitting 3D multichannel sound, which transmits eight- channel signals for family-style reproduction and signals representing the spatial difference between the original and eight-channel sounds. We will evaluate this method through a subjective evaluation experiment on coded signals.
MPEG Advanced audio coding (AAC) supports 2048-sample windows which give quiet long delay (about 50ms for coder). In this paper we propose AAC Low Delay codecs with 1024-sample window (23ms delay) and Ultra Low Delay ...
详细信息
ISBN:
(纸本)9783902823144
MPEG Advanced audio coding (AAC) supports 2048-sample windows which give quiet long delay (about 50ms for coder). In this paper we propose AAC Low Delay codecs with 1024-sample window (23ms delay) and Ultra Low Delay codecs with 512-sample window which gives delay about 12ms. LD and ULD can be used in real-time coding e.g. in robots control and very fast voice communication. This design is complete coders and decoders with simple bit-rate control algorithm. Proposed design was implemented in FPGA devices.
Starting from the orthogonal (greedy) least squares method, we build an adaptive algorithm for finding online sparse solutions to linear systems. The algorithm belongs to the exponentially windowed recursive least squ...
详细信息
Starting from the orthogonal (greedy) least squares method, we build an adaptive algorithm for finding online sparse solutions to linear systems. The algorithm belongs to the exponentially windowed recursive least squares (RLS) family and maintains a partial orthogonal factorization with pivoting of the system matrix. For complexity reasons, the permutations that bring the relevant columns into the first positions are restrained mainly to interchanges between neighbors at each time moment. The storage scheme allows the computation of the exact factorization, implicitly working on indefinitely long vectors. The sparsity level of the solution, i.e., the number of nonzero elements, is estimated using information theoretic criteria, in particular Bayesian information criterion (BIC) and predictive least squares. We present simulations showing that, for identifying sparse time-varying FIR channels, our algorithm is consistently better than previous sparse RLS methods based on the l(1)-norm regularization of the RLS criterion. We also use our sparse greedy RLS algorithm for computing linear predictions in a lossless audio coding scheme and obtain better compression than MPEG4 ALS using an RLS-LMS cascade.
Perceptual models exploiting auditory masking are frequently used in audio and speech processing applications like coding and watermarking. In most cases, these models only take into account spectral masking in short-...
详细信息
Perceptual models exploiting auditory masking are frequently used in audio and speech processing applications like coding and watermarking. In most cases, these models only take into account spectral masking in short-time frames. As a consequence, undesired audible artifacts in the temporal domain may be introduced (e.g., pre-echoes). In this article we present a new low-complexity spectro-temporal distortion measure. The model facilitates the computation of analytic expressions for masking thresholds, while advanced spectro-temporal models typically need computationally demanding adaptive procedures to find an estimate of these masking thresholds. We show that the proposed method gives similar masking predictions as an advanced spectro-temporal model with only a fraction of its computational power. The proposed method is also compared with a spectral-only model by means of a listening test. From this test it can be concluded that for non-stationary frames the spectral model underestimates the audibility of introduced errors and therefore overestimates the masking curve. As a consequence, the system of interest incorrectly assumes that errors are masked in a particular frame, which leads to audible artifacts. This is not the case with the proposed method which correctly detects the errors made in the temporal structure of the signal.
Studies of Gaver (W. W. Gaver, "How do we hear in the world? Explorations in ecological acoustics," Ecological Psychology, 1993) revealed that humans categorize everyday sounds considering the processes that...
详细信息
Studies of Gaver (W. W. Gaver, "How do we hear in the world? Explorations in ecological acoustics," Ecological Psychology, 1993) revealed that humans categorize everyday sounds considering the processes that have generated them: He defined these categories in a taxonomy according to the aggregate states of the involved materials (solid, liquid, gas) and the physical nature of the sound generating interaction such as deformation, friction, etc., for solids. We exemplified this taxonomy in an everyday sound database that contains recordings of basic isolated sound events of these categories. We used a sparse method to represent and to visualize these sound events. This representation relies on a sparse decomposition of sounds into atomic filter functions in the time-frequency domain. The filter functions maximally correlated with a given sound are selected automatically to perform the decomposition. The obtained sparse point pattern depicts the skeleton of the given sound. The visualization of these point patterns revealed that acoustically similar sounds have similar point patterns. To detect these similarities, we defined a novel dissimilarity function by considering these point patterns as 3-D point graphs and applied a graph matching algorithm, which assigns the points of one sound to the points of the other sound. This novel dissimilarity measure is used in combination with a kernel machine for the classification experiments, yielding an average accuracy of 95% in one versus one discrimination tasks.
This paper presents an audio compression method based on wavelets sub-band quantization and coding, and proposes a coder based on that method. The proposed coder uses the wavelets packets transform in order to obtain ...
详细信息
This paper presents an audio compression method based on wavelets sub-band quantization and coding, and proposes a coder based on that method. The proposed coder uses the wavelets packets transform in order to obtain the critical bands of the human auditory system. Some results of the MPEG-layer 2 psychoacoustic model are used in the wavelets coefficients coding. The MPEG results are transformed to the wavelet domain in order to determinate the quantizer type and the quantization levels number for each wavelet sub-band. The transform method of these results is also proposed. The coder uses scalar and vector quantization methods according with the sensibility of the human auditory system for each wavelet sub-band. The entropy coding is also used in order to improve the performance of the proposed coder. The results of the subjective evaluation demonstrate that the proposed coder achieve transparent coding of the monophonic CD signals at bit rates of 80-96 Kbit/seg.
Listening for impairments introduced by multichannel audio codecs is an important task. Classical objective methods are not adequate in assessing audio coding schemes. Accordingly, International Telecommunications Uni...
详细信息
Listening for impairments introduced by multichannel audio codecs is an important task. Classical objective methods are not adequate in assessing audio coding schemes. Accordingly, International Telecommunications Union Recommendations Section (ITU-R) Recommendations BS.1116 and BS.1534-1 provide guidelines for subjective evaluation of codecs. This paper provides a tutorial on the proper conditions for reliable codec testing. Several key components covered are properly designing the experiment;selecting the listening panel and training listeners;developing the test methodology;selecting balanced program material, loudspeaker or room, and sound-field requirements;listening for artifacts;and analyzing statistics. This paper addresses these various components, including the sound-field requirements, because per the ITU, "The characteristics of the reference sound field at the listening area are most important for the subjective perception of, or the quality assessment of, auditory events and their reproducibility at other listening places or rooms. These characteristics result from the interaction of the loudspeaker(s) and the listening room."
Paper presents a cost-efficient and convenient solution for computer-aided AC measurements that student can perform either in university lab or at home. A commercial audio codec chip with USB interface is used to desi...
详细信息
ISBN:
(纸本)9781467317092
Paper presents a cost-efficient and convenient solution for computer-aided AC measurements that student can perform either in university lab or at home. A commercial audio codec chip with USB interface is used to design an oscilloscope and AC generator that may be used together with any personal computer without specific software drivers. This scope-generator is included into the new release of HomeLabKit that is a small case containing necessary equipment to perform the basic laboratory works of the circuit theory course.
This paper introduced a scheme of design an embedded digital voice recording system on SOPC technology. By configure NiosII soft core CPU and some corresponding interface modules on a PFGA to construct an embedded sys...
详细信息
ISBN:
(纸本)9781467317443
This paper introduced a scheme of design an embedded digital voice recording system on SOPC technology. By configure NiosII soft core CPU and some corresponding interface modules on a PFGA to construct an embedded system' s hardware, and combine software programming to controlling audio encode and decode IC WM8731 and SDRAM, system has realized A/D, D/A conversion, saving and replaying of audio signal. Due to using the SOPC and DMA technology, the system has high design flexibility and good expansibility and quick data processing speed.
This paper investigates the use of sparse overcomplete decompositions for audio coding. audio signals are decomposed over a redundant union of modified discrete cosine transform (MDCT) bases having eight different sca...
详细信息
This paper investigates the use of sparse overcomplete decompositions for audio coding. audio signals are decomposed over a redundant union of modified discrete cosine transform (MDCT) bases having eight different scales. This approach produces a sparser decomposition than the traditional MDCT-based orthogonal transform and allows better coding efficiency at low bitrates. Contrary to state-of-the-art low bitrate coders, which are based on pure parametric or hybrid representations, our approach is able to provide transparency. Moreover, we use a bitplane encoding approach, which provides a fine-grain scalable coder that can seamlessly operate from very low bitrates up to transparency. Objective evaluation, as well as listening tests, show that the performance of our coder is significantly better than a state-of-the-art transform coder at very low bitrates and has similar performance at high bitrates. We provide a link to test soundfiles and source code to allow better evaluation and reproducibility of the results.
暂无评论