We proposed a new frequency domain bandwidth extension (BWE) technology. In the new technology, FFT based frequency domain gain shaping combined with linear prediction coding (LPC) based spectral envelope shaping is u...
详细信息
We proposed a new frequency domain bandwidth extension (BWE) technology. In the new technology, FFT based frequency domain gain shaping combined with linear prediction coding (LPC) based spectral envelope shaping is used for generating high frequency signals. To preserve the amount of noise component in the reconstructed band, gain reduction controlled by spectrum flatness measurement (SFM) is employed. Subjective testing results show that the presented technology exhibits a comparable performance compared to 3GPP AMR-WB+ with the same bit-rate in the framework of audio video coding of China standard (AVS) Part 10 - mobile speech and audio Codec. This technology has been formally adopted as the artificial high band coding module in AVS P10.
In this paper, we propose to combine the Kalman filter with a recent speech enhancement technique, called the phase spectrum compensation procedure, or PSC. More specifically, we apply the PSC technique to initialise ...
详细信息
In this paper, we propose to combine the Kalman filter with a recent speech enhancement technique, called the phase spectrum compensation procedure, or PSC. More specifically, we apply the PSC technique to initialise the Kalman filter, whereby PSC is used to clean the noisy speech prior to LPC estimation for the Kalman recursion. We refer to the combined technique as the Kalman-PSC filter. Using an objective speech quality measure, formal subjective listening tests and spectrogram analysis, we show that the proposed method results in improved speech quality.
This paper presents an approach of speech enhancement techniques to improve the performance of the robust speaker identification under noisy environments. Start-end points detection, silence part removal, frame segmen...
详细信息
This paper presents an approach of speech enhancement techniques to improve the performance of the robust speaker identification under noisy environments. Start-end points detection, silence part removal, frame segmentation and windowing technique have been used to pre-process and Wiener filter has been used to remove the silence parts from the speech utterances. To extract the features from the speech various speech parameterization techniques that is LPC, LPCC, RCC, MFCC, ¿MFCC and ¿¿MFCC have been simulated. Finally, to measure the performance of the proposed speech enhancement techniques, genetic algorithm has been used as a classifier for the noise robust automated speaker identification system and various experiments have performed on genetic algorithm to select the optimum parameters. According to the NOIZEOUS speech database, the highest identification rate of 70.31 [%] for text-dependent and of 61.26 [%] for text-independent speaker identification system have been achieved.
The use of a linear periodic controller (LPC) has been proposed as a new approach in the field of model reference adaptive control. The resulting controller can handle rapid changes in plant parameters, and it provide...
详细信息
The use of a linear periodic controller (LPC) has been proposed as a new approach in the field of model reference adaptive control. The resulting controller can handle rapid changes in plant parameters, and it provides smooth transient behavior for a closed-loop system. Moreover, the LPC generates control signals, which are modes in size when measured using the infinity norm. Although the LPC has these advantages, it suffers from poor noise tolerance. The smaller the sampling time is the less noise tolerant the controller is. In this work, to alleviate this drawback, we apply a probing signal with a larger size. The probing size is inversely proportional to the sampling time. The proposed method has significantly better noise rejection but larger control signal.
Speech processing has been an active area for several decades with a wide variety of applications ranging from communications to automatic reading machines. There are many speech recognition techniques, which are base...
详细信息
ISBN:
(纸本)9781424442133
Speech processing has been an active area for several decades with a wide variety of applications ranging from communications to automatic reading machines. There are many speech recognition techniques, which are based on statistical techniques as well as neural networks. The present work investigates the feasibility of two approaches for solving the problem using neural networks.
In this paper, we revisit the manifold assumption which has been widely adopted in the learning-based image super-resolution. The assumption states that point-pairs from the high-resolution manifold share the local ge...
详细信息
In this paper, we revisit the manifold assumption which has been widely adopted in the learning-based image super-resolution. The assumption states that point-pairs from the high-resolution manifold share the local geometry with the corresponding low-resolution manifold. However, the assumption does not hold always, since the one-to-multiple mapping from LR to HR makes neighbor reconstruction ambiguous and results in blurring and artifacts. To minimize the ambiguous, we utilize Locality Preserving Constraints (LPC) to avoid confusions through emphasizing the consistency of localities on both manifolds explicitly. The LPC are combined with a MAP framework, and realized by building a set of cell-pairs on the coupled manifolds. Finally, we propose an energy minimization algorithm for the MAP with LPC which can reconstruct high quality images compared with previous methods. Experimental results show the effectiveness of our method.
Speaker diarization systems attempt to assign temporal speech segments in a conversation to the appropriate speaker, and non-speech segments to non-speech. Speaker diarization systems basically provide an answer to th...
详细信息
Speaker diarization systems attempt to assign temporal speech segments in a conversation to the appropriate speaker, and non-speech segments to non-speech. Speaker diarization systems basically provide an answer to the question "Who spoke when ?". One inherent deficiency of most current systems is their inability to handle co-channel or overlapped speech. During the past few years, several studies have attempted dealing with the problem of overlapped or co-channel speech detection and separation, however, most of the algorithms suggested perform under unique conditions, require high computational complexity and require both time and frequency domain analysis of the audio data. In this study, frame based entropy analysis of the audio data in the time domain serves as a single feature for an overlapped speech detection algorithm. Identification of overlapped speech segments is performed using Gaussian Mixture Modeling (GMM) along with well known classification algorithms applied on two speaker conversations. By employing this methodology, the proposed method eliminates the need for setting a hard threshold for each conversation or database. LDC CALLHOME American English corpus is used for evaluation of the suggested algorithm. The proposed method successfully detects 60.0% of the frames labeled as overlapped speech by the baseline (ground-truth) segmentation , while keeping a 5% false-alarm rate.
Methods involved to generate the excitation parameters and gain have been proposed that provide improvement over plain linear predictive coding (LPC) in terms of better speech reconstruction. DCT computation to transm...
详细信息
Methods involved to generate the excitation parameters and gain have been proposed that provide improvement over plain linear predictive coding (LPC) in terms of better speech reconstruction. DCT computation to transmit the excitation signal energy in initial few coefficients gives a better reconstruction of innovation signal at the receiver as compared to the primitive method of pitch estimation. Gain computation in terms of the root mean square (RMS) value of the voltage levels for each frame reduces the complexity involved as compared to plain LPC, where gain was calculated in terms of mean square error.
In today's world, telecommunication is the field in which the maximum research work is being done and speech compression is the most vital part of communication. For high compression rate speech coders, voice exci...
详细信息
In today's world, telecommunication is the field in which the maximum research work is being done and speech compression is the most vital part of communication. For high compression rate speech coders, voice excited linear predictive coding (VELP) is most widely used. This paper presents real time analysis of VELP by implementing it on TMS320C6711 DSP kit using Simulink RTW (real time workshop) which explains simulink model of VELP, VELP analysis, VELP synthesis, VELP implementation on DSP kit. VELP can be briefly explained as the signal is passed through analyzer, which generate the filter co-efficients and residual signal. The residual error signal has less redundancy than original speech signal and can be quantized by smaller number of bits than the original speech. The residual error signal along with the filter coefficients are transmitted to the receiver. At the receiver, the speech is reconstructed by passing the residual error signal through the synthesis filter. To model a human speech production system, all-pole model (also known as the linear prediction model) is used. For the real time analysis of VELP, it is implemented on DSP kit using MATLAB Simulink model
This paper considers the problem of predictive fusion coding for storage of multiple spatio-temporally correlated sources so as to enable efficient selective retrieval of data from subsets of sources as designated by ...
详细信息
ISBN:
(纸本)9781424414833
This paper considers the problem of predictive fusion coding for storage of multiple spatio-temporally correlated sources so as to enable efficient selective retrieval of data from subsets of sources as designated by future queries. Only statistical information about future queries is available during encoding. While temporal correlations can be exploited by coding over large blocks, the growth in encoding complexity renders this approach impractical and hence the interest in a low complexity predictivecoding approach. However, the design of optimal predictive fusion coding systems is considerably complicated by the presence of the prediction loop, and the potentially exponential growth of the query sets. We propose a complexity-constrained predictive fusion coder and derive an iterative algorithm for its design, which is based on the "Asymptotic Closed Loop" framework and hence, circumvents convergence and stability issues of traditional predictive quantizer design. The proposed predictive fusion coder optimizes the distortion - retrieval rate tradeoff, given a fixed storage capacity, and provides significant gains over storage schemes that perform only joint compression or memoryless fusion coding of all sources.
暂无评论