Given a baseline speech coder and speech with an available phonetic class segmentation, a number of potential enhancements to that coder become possible. While the quality of speech segmentation by phoneme and phoneti...
详细信息
Given a baseline speech coder and speech with an available phonetic class segmentation, a number of potential enhancements to that coder become possible. While the quality of speech segmentation by phoneme and phonetic class is constantly improving, we use TIMIT to generate phonetic class segmentation as a basis for initial testing of these techniques. Using coders drawn from the MELP family, we explore specialized phonetic codebooks, phonetically-driven superframing, and improved modeling of specific phonetic classes and the transitions between them. We compare the reconstructed speech from these enhancements against the base coder using the metrics of computational cost, transmission cost, and the quality of the reconstructed speech. In most cases, we find that segmentation-based coders can produce speech with quality comparable to that of MELP, using fewer transmitted bits and at no additional computational cost. With phonetic codebooks and transition modeling, CCR tests show these segmentation-based coders produce speech of better quality than is produced by MELP.
Model based speech coders such as the mixed-excitation linear prediction (MELP) coder encode parameters of the autoregressive model for short-duration frames of the speech signal. Typically, parameters extracted from ...
详细信息
Model based speech coders such as the mixed-excitation linear prediction (MELP) coder encode parameters of the autoregressive model for short-duration frames of the speech signal. Typically, parameters extracted from successive frames by the MELP coder exhibit strong correlation. Reduction in the transmitted data-rates can be achieved if the encoders for these parameters effectively exploit this inter-frame correlation. In this paper, we apply a procedure, called dynamic codebook re-ordering (DCR) to reduce the entropy in the distribution of the symbols generated by the vector quantization encoders used in coding the MELP parameters. The entropy reduction is achieved by exploiting the correlation between the vectors of MELP parameters derived from successive speech frames. The advantages of the DCR procedure over other techniques that exploit inter-frame correlation stem from the fact that it significantly reduces the data-rates without introducing any additional coding delays or increasing the distortion and it is simple and elegant.
Adaptive detection has a rich history in the radar community, and a number of other areas have borrowed heavily from constructs developed in this field. The task of target detection in hyperspectral imaging (HSI) is o...
详细信息
We propose a blind multiuser detector based on Monte Carlo Markov chain (MCMC) techniques. The detector exploits mutually orthogonal complementary sequences to distinguish between transmitting users and space-time cod...
详细信息
This paper describes a low-rate feedback algorithm for conveying partial channel state information - specifically, the dominant row subspace of the channel matrix - from the receiver to the transmitter in a continuous...
详细信息
This paper presents the framework for an ultra low bit rate speech vocoder. The system is based on a recognition-synthesis paradigm in which a single ergodic hidden Markov model (EHMM) is used to capture the statistic...
详细信息
This paper presents two new imaging algorithms for detecting the positions of subsurface targets, e.g., land mines, using seismic waves. They are based on the CLEAN algorithm and its high resolution version RELAX. Thi...
详细信息
This paper presents the framework for an ultra low bit rate speech vocoder. The system is based on a recognition-synthesis paradigm in which a single ergodic hidden Markov model (EHMM) is used to capture the statistic...
详细信息
This paper presents the framework for an ultra low bit rate speech vocoder. The system is based on a recognition-synthesis paradigm in which a single ergodic hidden Markov model (EHMM) is used to capture the statistical characterizations of speech in a flexible manner capable of limiting the effects of recognition errors. Because predetermined speech units are not used, this system has the advantage of not requiring a transcription for the training data set. By incorporating a mixed excitation scheme based on an improved MELP formulation into the EHMM, additional gains in quality and speaker characterization are achieved at no cost to the bit rate.
The paper presents two new imaging algorithms for detecting the positions of subsurface targets, e.g., land mines, using seismic waves. They are based on the CLEAN algorithm and its high resolution version RELAX. The ...
详细信息
The paper presents two new imaging algorithms for detecting the positions of subsurface targets, e.g., land mines, using seismic waves. They are based on the CLEAN algorithm and its high resolution version RELAX. The paper shows how the CLEAN and RELAX algorithms can be modified to work in a multi-static active array setup for detecting passive targets. Seismic surface waves reflected from various targets are collected at a receiving array and processed to locate the reflectors. The modified imaging algorithms are demonstrated to work for experimental seismic data that include mines and rocks (clutter).
The paper presents a new memory-efficient distributed arithmetic (DA) architecture for high-order FIR filters. The proposed architecture is based on a memory reduction technique for DA look-up-tables (LUTs); it requir...
详细信息
The paper presents a new memory-efficient distributed arithmetic (DA) architecture for high-order FIR filters. The proposed architecture is based on a memory reduction technique for DA look-up-tables (LUTs); it requires fewer transistors for high-order filters than original LUT-based DA, DA-offset binary coding (DA-OBC), and the LUT-less DA-OBC. Recursive iteration of the memory reduction technique significantly increases the maximum number of filter order implementable on an FPGA platform by not only saving transistor counts, but also balancing hardware usage between logic element (LE) and memory. FPGA implementation results confirm that the proposed DA architecture can implement a 1024-tap FIR filter with significantly smaller area usage (<50%) than the original LUT-based DA and the LUT-less DA-OBC.
暂无评论