mining from ambiguousdata is very important in datamining. This paper discusses one of the tasks for mining from ambiguousdata known as multi-instance problem. In multi-instance problem, each pattern is a labeled b...
详细信息
mining from ambiguousdata is very important in datamining. This paper discusses one of the tasks for mining from ambiguousdata known as multi-instance problem. In multi-instance problem, each pattern is a labeled bag that consists of a number of unlabeled instances. A bag is negative if all instances in it are negative. A bag is positive if it has at least one positive instance. Because the instances in the positive bag are not labeled, each positive bag is an ambiguous. The mining aim is to classify unseen bags. The main idea of existing multi-instancealgorithms is to find true positive instances in positive bags and convert the multi-instance problem to the supervised problem, and get the labels of test bags according to predict the labels of unknown instances. In this paper, we aim at mining the multi-instancedata from another point of view, i.e., excluding the false positive instances in positive bags and predicting the label of an entire unknown bag. We propose an algorithm called multi-instancecoveringknn (MICknn) for mining from multi-instancedata. Briefly, constructivecoveringalgorithm is utilized to restructure the structure of the original multi-instancedata at first. Then, the knnalgorithm is applied to discriminate the false positive instances. In the test stage, we label the tested bag directly according to the similarity between the unseen bag and sphere neighbors obtained from last two steps. Experimental results demonstrate the proposed algorithm is competitive with most of the state-of-the-art multi-instance methods both in classification accuracy and running time.
暂无评论