Learning accurate Bayesian network (BN) classifiers from complete databases is a very active research topic in data mining and machine learning. However, in practice, databases are rarely complete. This affects their ...
详细信息
ISBN:
(纸本)0780374908
Learning accurate Bayesian network (BN) classifiers from complete databases is a very active research topic in data mining and machine learning. However, in practice, databases are rarely complete. This affects their real world data mining applications. This paper investigates the methods for learning four types well-known Bayesian network classifiers from incomplete databases. These four types BN classifiers are: Naive-Bayes, tree augmented Naive-Bayes, BN augmented Naive-Bayes, and general BN, where the latter two are learned using dependency analysis based algorithms that work only on the database completeness assumption. In order to enable this kind of algorithms to handle with missing data, this paper introduces a novel deterministic method to estimate the (conditional) mutual information from incomplete databases, which can be used to do CI tests, a fundamental step in the dependency analysis based algorithms. The experimental results show that our algorithm is efficient and reliable.
In this paper, a new method is proposed to improve the canonical Katz back-off smoothing technique in language modeling. The process of Katz smoothing is detailedly analyzed and the global discounting parameters are s...
详细信息
In this paper, a new method is proposed to improve the canonical Katz back-off smoothing technique in language modeling. The process of Katz smoothing is detailedly analyzed and the global discounting parameters are selected for discounting. Further more, a modified version of the formula for discounting parameters is proposed, in which the discounting parameters are determined by not only the occurring counts of the n-gram units but also the low-order history frequencies. This modification makes the smoothing more reasonable for those n-gram units that have homophonic (same in pronunciation) histories. The new method is tested on a Chinese Pinyin-to-character (where Pinyin is the pronunciation string) conversion system and the results show that the improved method can achieve a surprising reduction both in perplexity and Chinese character error rate.
Hidden Markov Model (HMM) has been applied to the problem of machine recognition of Chinese handwriting. The character image is segmented into a number of local regions and feature vectors of these regions are extract...
详细信息
Hidden Markov Model (HMM) has been applied to the problem of machine recognition of Chinese handwriting, The character image is segmented into a number of local regions and feature vectors of these regions are extract...
详细信息
Support vector machine (SVM) has been proved to be a powerful tool for solving practical pattern recognition problems based on learning from data. Due to large number of support vectors learnt from huge amount of trai...
详细信息
This paper presents a new approach named Primitive-based Coupled-HMM for human natural complex action recognition. First, the system proposes a hybrid human model and employs 2-order B-spline function to detect the tw...
详细信息
It reveals some equivalences between automata based on complete residuated lattice-valued logic (called (?) valued automata) and the truth-value lattice of the underlying logic (i.e. residuated lattice). In particular...
详细信息
It reveals some equivalences between automata based on complete residuated lattice-valued logic (called (?) valued automata) and the truth-value lattice of the underlying logic (i.e. residuated lattice). In particular, it demonstrates several basic equivalent characterizations on the retriev-ability of (?) valued automata. Finally, the connections of the homomorphisms between two eeeeeeeeee valued automata to continuous mappings and open mappings are clarified. So this paper establishes further the more profound fuzzy automata theory.
This paper proposed a new feature extraction method for Chinese character recognition by using optimized Gabor filters. Based on the theory of Gabor filters and the statistical information of Chinese character images,...
详细信息
The paper proposed a new syntactic annotation scheme - functional chunk, which tried to represent information about grammatical relations between sentence-level predicates and their arguments. Under this scheme, we bu...
详细信息
The paper proposed a new syntactic annotation scheme - functional chunk, which tried to represent information about grammatical relations between sentence-level predicates and their arguments. Under this scheme, we built a Chinese chunk bank with about two million Chinese characters, and developed some learned models for automatically annotating fresh text with functional chunks. We also proposed a two-stages approach to build Chinese tree bank on the top of chunk bank, and gave some experimental results of chunk-based syntactic parser to show the advantage of functional chunk for parsing performance increase. All these work lays good foundations for further research project to build a large scale Chinese tree bank.
We derive a lower bound on the inconclusive probability of unambiguous discrimination among n linearly independent quantum states by using the constraint of no signaling. It improves the bound presented in the paper o...
详细信息
We derive a lower bound on the inconclusive probability of unambiguous discrimination among n linearly independent quantum states by using the constraint of no signaling. It improves the bound presented in the paper of Zhang, Feng, Sun, and Ying [Phys. Rev. A 64, 062103 (2001)], and when the optimal discrimination can be reached, these two bounds coincide with each other. An alternative method of constructing an appropriate measurement to prove the lower bound is also presented.
暂无评论