A 450 bps speech coder based on multi-frame structure and multi-mode matrix quantization is presented. The multiframe structure consisting of four frames is adopted to reduce the algorithm delay. The parameter matrice...
详细信息
A 450 bps speech coder based on multi-frame structure and multi-mode matrix quantization is presented. The multiframe structure consisting of four frames is adopted to reduce the algorithm delay. The parameter matrices are classified into different modes based on the voicing vector information of superframe. To improve speech quality, a dynamic bit allocation scheme is developed. Experimental results show that the speech quality of the proposed vocoder is intelligible with good naturalness.
The Partitioned Feature-based Classifier (PFC) is proposed in this paper. PFC does not use entire feature vectors extracted from the original data at once to classify each datum, but use only groups of features relate...
详细信息
The Partitioned Feature-based Classifier (PFC) is proposed in this paper. PFC does not use entire feature vectors extracted from the original data at once to classify each datum, but use only groups of features related to each feature vector to classify data separately. In the training stage, the contribution rate calculated from each feature vector group is drawn throughout the accuracy of each feature vector group and then, in the testing stage, the final classification result is obtained by applying weights corresponding to the contribution rate of each feature vector group. The proposed PFC algorithm is applied to two audio data classification problems, a speech/music data classification problem and a music genre classification problem. The results demonstrate that conventional clustering algorithms can improve their classification accuracy when the proposed PFC model is used with them.
A novel brain computer interface (BCI) is implemented in this study, which only depends on auditory modality. The subjects voluntary recognition of the property (e.g. voice laterality) of a target human voice makes th...
详细信息
A novel brain computer interface (BCI) is implemented in this study, which only depends on auditory modality. The subjects voluntary recognition of the property (e.g. voice laterality) of a target human voice makes the discriminability between brain responses to target and non-target voices in a random sequence. EEG data from eight subjects showed that the amplitude of N2 and late positive component (LPC) elicited by target voice was significantly higher than that by non-target ones. The area of N2 and LPC component was used as features for identifying the target among digit numbers 1-8, achieving an average accuracy of 89%.
Through experiment, the paper analyses and extracts MFCC and delta features of different Tibetan speakers and makes comparison. It shows that MFCC has certain stability between frames of the same speaker, while delta ...
详细信息
Through experiment, the paper analyses and extracts MFCC and delta features of different Tibetan speakers and makes comparison. It shows that MFCC has certain stability between frames of the same speaker, while delta feature discriminates speakers better than MFCC does.
Song and music discrimination play a significant role in multimedia applications such as genre classification and singer identification. The problem of identifying sections of singer voice and instruments is addressed...
详细信息
ISBN:
(纸本)9781424450909;9781424450916
Song and music discrimination play a significant role in multimedia applications such as genre classification and singer identification. The problem of identifying sections of singer voice and instruments is addressed in this paper. A set of classification techniques based on features extracted from the auditory models, which are commonly used in the speech and speaker recognition domains, are investigated in this paper. All the proposed approaches, assuming no knowledge of song and music segments, use only a threshold based distance measure for discrimination. Particularly, it is observed that certain approaches are more appropriate for tracking the singer, while others are more appropriate for detecting the transition from music to the singer and vice versa. The experimental data are extracted from the music genre database RWC including various styles.
Jawi is an old version of Malay language writing that need to be preserved. Therefore, it is important to develop tools for teaching kids about Jawi characters and speech-to-text (STT) application can serve this purpo...
详细信息
Jawi is an old version of Malay language writing that need to be preserved. Therefore, it is important to develop tools for teaching kids about Jawi characters and speech-to-text (STT) application can serve this purpose well. Unlike English, Jawi uses special characters similar to Arabic characters. However, its pronunciations are in Malay language. This uniqueness makes STT development a challenging task. In this paper, we investigate the applicability of linear predictive coding to extract important features from voice signal and neural network with backpropagation to classify and recognize spoken words into Jawi characters. A total of 225 samples of words in Jawi characters are recorded from speakers with over 95% accuracy. Jawi characters speech-to-text engine aims to help students to read Jawi document accurately and independently without the need for close monitoring from parents or teachers.
In this paper an FS1015 LPC coder has been designed using Matlab to produce intelligible speech. This paper focuses on the different methods of implementation and compared to determine, which gives the best performanc...
详细信息
In this paper an FS1015 LPC coder has been designed using Matlab to produce intelligible speech. This paper focuses on the different methods of implementation and compared to determine, which gives the best performance. The coder has been tested on Hindi. Subjective mean opinion scores have been used to measure the speech quality based on the score given by the listeners.
In low bit-rate coders, the near-sample and far-sample redundancies of the speech signal are usually removed by a cascade of a short-term and a long-term linear predictor. These two predictors are usually found in a s...
详细信息
In low bit-rate coders, the near-sample and far-sample redundancies of the speech signal are usually removed by a cascade of a short-term and a long-term linear predictor. These two predictors are usually found in a sequential and therefore suboptimal approach. In this paper we propose an analysis model that jointly finds the two predictors by adding a regularization term in the minimization process to impose sparsity constraints on a high order predictor. The result is a linear predictor that can be easily factorized into the short-term and long-term predictors. This estimation method is then incorporated into an algebraic code excited linear prediction scheme and shows to have a better performance than traditional cascade methods and other joint optimization methods, offering lower distortion and higher perceptual speech quality.
The success of data hiding methods, which hide data to media like speech signal, is primarily determined by their strength aganist to the steganalysis methods. In this paper, some data hiding methods, which embed secr...
详细信息
The success of data hiding methods, which hide data to media like speech signal, is primarily determined by their strength aganist to the steganalysis methods. In this paper, some data hiding methods, which embed secret data during MELP analysis of speech signal using quantization index modulation are introduced. These methods in question are examined by a steganalysis method which takes advantage of chaotic-type features of speech signal. Thus, by evaluating the steganalysis results, the practical usage limitations of data hiding methods can be exposed.
The Lines Of Maximum Amplitude (LOMA) of the wavelet transform are used for glottal closure instant detection. Following Kadambe & al. (1992), the wavelet transform modulus maxima can be used for singularity detec...
详细信息
The Lines Of Maximum Amplitude (LOMA) of the wavelet transform are used for glottal closure instant detection. Following Kadambe & al. (1992), the wavelet transform modulus maxima can be used for singularity detection. The LOMA method extends this idea. All the lines chaining maxima of a wavelet transform across scales are built. Then a back-tracking procedure allows for selection of the optimal line for each pitch period, the top of which indicates the GCI. The LOMA method is then evaluated by comparing its results to the DYPSA (Naylor & al.) algorithm, with the option of using inverse filtering as preprocessing. The LOMA method compares favorably to DYPSA, particularly on accuracy. One of the advantage of the LOMA method is its ability to deal with variations in the glottal source parameters.
暂无评论