Classification for large datasets is a classical problem in machinelearning. In this paper, we focus on effevtive classification algorithm for large datasets and imbalanced datasets. First, to deal with imbalanced da...
详细信息
Classification for large datasets is a classical problem in machinelearning. In this paper, we focus on effevtive classification algorithm for large datasets and imbalanced datasets. First, to deal with imbalanced dataset, we define the weight according to the size of positive and negative dataset. Then, a fast learning algorithm on large datasets called a core set weighted support vector machines(CSWSVM) is proposed. In the proposed approach, the corresponding core set(CS) can be solved by employing the core vector machine(CVM) or generalized CVM(GCVM), and then the weighted support vector machines(WSVM) can be used to implement classification for imbalanced datasets. Experimental results on UCI and USPS datasets demonstrate that the proposed method is effective.
Traditional rough set theory(TRS) is based on the concept of equivalence relation to define upper and lower approximation sets of a given target concept, and therefore uncertainties in information systems can be repre...
详细信息
Traditional rough set theory(TRS) is based on the concept of equivalence relation to define upper and lower approximation sets of a given target concept, and therefore uncertainties in information systems can be represented. By using equivalence relations, TRS only considers whether attribute values are distinguished or not, regardless of the preference information contained in attribute values. Rough sets based on dominance relations effectively solve this problem and can deal with preference-ordered data. In these dominance-based approaches, the computational cost of the dominance classes greatly affects the efficiency of attribute reduction and rule extraction. This paper presents an efficient method of computing dominance classes in an ordered information system by rapidly reducing the search space. Based on the definition of dominance class, the inferior class of an object is gradually removed from the universe with the increase of the attributes in the computation process. Experiments on ten UCI data sets show that the proposed algorithm obviously improves the efficiency of computing dominance classes, especially for large-scale data.
Dominance relation rough set approach(DRSA) is a useful mathematical tool to deal with preference-ordered data. The main idea is using dominance relations to replace equivalent relations in classical rough set theory....
详细信息
Dominance relation rough set approach(DRSA) is a useful mathematical tool to deal with preference-ordered data. The main idea is using dominance relations to replace equivalent relations in classical rough set theory. However, the definition of conventional dominance relation is very strict which may limit its application to information systems with relative large number of attributes. In this paper, we relax the conditions in the definition of dominance relation and introduce the concept of extended dominance relation. The proprieties of this new concept are also discussed and it is found that all the properties of classical dominance relation are still satisfied.
Pathfinding is a typical task in many computer games, and its performance will affect the quality of game AI. In order to enhance the efficiency of multi-task pathfinding, case-based reasoning has been introduced in t...
详细信息
Pathfinding is a typical task in many computer games, and its performance will affect the quality of game AI. In order to enhance the efficiency of multi-task pathfinding, case-based reasoning has been introduced in traditional A* algorithm, called the CBMT method. The method needs to select representative paths which can cover the whole map to build a compact case base, which is difficult in large maps. Besides, repeatedly searching for similar cases for each pathfinding task would be a time consuming process. To address these problems, we provide a kd-tree case storage structure and case retrieval mechanical in the CBMT method. The pre-stored cases(previously found paths) are generated randomly and incrementally. The original flat storage structure of the cases is changed into the kd-tree structure. Since the searching space can be reduced by branch pruning in case retrieval, the pathfinding efficiency has been improved obviously, and the number of searched nodes is also reduced.
Deep Web can provide us a great amount of high quality information. In order to make full use of the information, it is becoming urgent to establish Deep Web data integration system, in which Deep Web interface integr...
详细信息
Support Vector machine (SVM) is a classification technique of machinelearning based on statistical learning theory. A quadratic optimization problem needs to be solved in the algorithm, and with the increase of the s...
详细信息
This paper presents a new method for the mining the hottest topics on Chinese webpage which is based on the improved k-means partitioning algorithm. The dictionary applied to word segmentation is reduced by deleting w...
详细信息
This paper presents a new method for the mining the hottest topics on Chinese webpage which is based on the improved k-means partitioning algorithm. The dictionary applied to word segmentation is reduced by deleting words which are useless for clustering, and the dictionary tree is created to be applied to word segmentation. Then the speed of word segmentation is improved. Correspondence between words and integers is created by coding words. Then the title is expressed by integer set, and the cost of space and time for clustering is decreased largely. Determining the value of k is a shortcoming of stream data mining based on k-means. By this new method, the value of k is adjusted in clustering. Then both the accuracy and the speed are improved.
In this study, we study set operations on type-2 fuzzy sets. We first discuss join and meet operations of membership grades of type-2 fuzzy sets under left continuous t-norms and derive distributive law of type-2 fuzz...
详细信息
In this study, we study set operations on type-2 fuzzy sets. We first discuss join and meet operations of membership grades of type-2 fuzzy sets under left continuous t-norms and derive distributive law of type-2 fuzzy sets. Then, some properties on compositions of fuzzy relations is discussed. We derived that the distributive laws under union and composition of type-2 fuzzy relations is valid. An example shows the failure of distributive laws under intersection and composition.
Distribution network cabling planning is a very complex project This paper proposes the application of intelligent decision support technology in Power System. By adding a module library and the concept of model manag...
详细信息
Distribution network cabling planning is a very complex project This paper proposes the application of intelligent decision support technology in Power System. By adding a module library and the concept of model management systems, Intelligent Power Service System realizes intelligence decision support in the distribution network power cabling planning by using dynamic programming, spatial data mining and decision tree techniques, and has a certain amount of self-learning ability.
This paper presents a reasoning algorithm based on interaction with fuzzy rule matrix transformation, and applies it to completing the patterns. Then the new full patterns will be used in training and synthetic judgme...
详细信息
This paper presents a reasoning algorithm based on interaction with fuzzy rule matrix transformation, and applies it to completing the patterns. Then the new full patterns will be used in training and synthetic judgment The investigation shows that the method is effective and may be widely used in Reasoning with Incomplete Knowledge.
暂无评论