The exact matching of keywords is key to popular commercial search engines. A Chinese approximate matching method with an index structure was developed to achieve better retrieval when the input contains errors. Three...
详细信息
The exact matching of keywords is key to popular commercial search engines. A Chinese approximate matching method with an index structure was developed to achieve better retrieval when the input contains errors. Three types of similarity measurement between two Chinese strings were developed based on the character edit-distance, the Pinyin edit-distance and the Pinyin improved edit-distance. The similarity measurements were used to expand the user's query so that the approximate matching task can be represented as several exact matching sub-tasks. The results of these exact matchings are merged and sorted by their similarity to the original query. Tests on a webpage text database gave a 50.4% recall rate with the Pinyin improved edit-distance with a 60.4% precision with a small increase in time and space complexity.
A linear scaling (LS) based dynamic programming (DP) algorithm was developed for accurate matching of queries by humming. The query contours are split into phrases, with the LS match calculated for each phrase. Finall...
详细信息
A linear scaling (LS) based dynamic programming (DP) algorithm was developed for accurate matching of queries by humming. The query contours are split into phrases, with the LS match calculated for each phrase. Finally dynamic programming is used to analyze on all the phrases to choose the optimal matching path. The algorithm more efficiently considers the query contours related to the phrases, thus, overcoming the missing-global-optimal-path disadvantage of dynamic programming for long path matching. Tests on a 5 223 MIDI database show that the algorithm outperforms the traditional LS method by 10.5%, the DP method by 6.0% and recursive alignment by 2.8% for the top-1 rate. Thus, the algorithm is more efficient and more accurate while being less expense.
The grammar for spoken dialogue systems for information enquiry is often manually designed by experts. Automatic grammar inference method based on sentence segmentation was developed based on an enhanced context free ...
详细信息
The grammar for spoken dialogue systems for information enquiry is often manually designed by experts. Automatic grammar inference method based on sentence segmentation was developed based on an enhanced context free grammar for spoken Chinese. The system parses the training sentences with an initial rule set. If the parsed syntactic tree is incomplete, the top-most constituents are used to recursively infer the missing rules after disambiguation and normalization, and then the rule set is updated. The output grammar is improved by adjusting the processing order of the training sentences to refine the process. Evaluations based on weather forecast enquiries gave a parsing accuracy for the output grammar of 64.8% with an empty initial rule set and 86.4% with an initial rule set including only rules for date descriptions.
Query by humming (QBH) is an important application for musical information retrieval. The key challenges in QBH are the unstructured data modules in audio songs and the balance between searching speed and accuracy. Th...
详细信息
Query by humming (QBH) is an important application for musical information retrieval. The key challenges in QBH are the unstructured data modules in audio songs and the balance between searching speed and accuracy. This paper presents a data structure for audio songs using a hand labeling method to label the melody and to divide the songs into natural segments. The search index uses the segmentation structure rather than the entire lyrics for the song. The system generates a VP-tree search structure with a multi-level searching algorithm that includes coarse searching for fast match and dynamic time warping (DTW) that leads to a fine match. Evaluations with 2 213 melody segments reduce the search time by over 40% without greatly reducing the recognition accuracy.
High error rate in speech recognition is largely due to effects of phone local mismatch caused by unclear speaking or noises. In this paper, we propose an approach of using local mismatch phone to improve the reliabil...
详细信息
Multiword chunking is defined as a task to automatically analyze the external function and internal structure of the multiword chunk(MWC) in a sentence. To deal with this problem, we proposed a rule acquisition algori...
详细信息
This paper proposes a text-dependent speaker identification system applied to Thai language. Isolated digits 0-9 and their concatenations are used for speaking text. Linear prediction coefficients (LPC) are extracted ...
详细信息
Presents a neural network based text-dependent speaker identification system for Thai language. Linear prediction coefficients (LPC) are extracted from speech signal and formed feature vectors. These features are fed ...
详细信息
Presents a neural network based text-dependent speaker identification system for Thai language. Linear prediction coefficients (LPC) are extracted from speech signal and formed feature vectors. These features are fed into a multilayer perceptron (MLP) neural network with backpropagation learning algorithm for training and identification processes. Five Thai tone marks are considered very closely in choosing the sentences in order to achieve the best speaker identification accuracy. Five speaking texts with each Thai tone and a mixed tone text are comparatively experimented. Average identification rate on 9 speakers achieves above 95% when using mixed tone text, and poor results occur with middle and low tone texts, which usually cause vagueness or unclear voices.
This paper proposes a text-dependent speaker identification system applied to Thai language. Isolated digits 0-9 and their concatenations are used for speaking text. Linear prediction coefficients (LPC) are extracted ...
详细信息
This paper proposes a text-dependent speaker identification system applied to Thai language. Isolated digits 0-9 and their concatenations are used for speaking text. Linear prediction coefficients (LPC) are extracted and formed as feature vectors represented each speech signal. Dynamic time warping (DTW) is used to measure distances between referenced and evaluated vectors. These distances, indicating nearness of unknown vectors to references, incorporated with the K-nearest neighbor (KNN) decision technique are used to decide who possesses those unknown vectors. The experimental results have shown that the best identification rate for a single digit is 95.83% and the highest rate for concatenated digits of top-3, top-5, and top-7 are 98.75%, 100%, and 99.20%, respectively.
暂无评论