咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Unlabeling data can improve cl... 收藏

Unlabeling data can improve classification accuracy

作     者:Lausser, Ludwig Schmid, Florian Schmid, Matthias Kestler, Hans A. 

作者机构:Univ Ulm Inst Neural Informat Proc Res Grp Bioinformat & Syst Biol D-89069 Ulm Germany Univ Munich Dept Stat D-80539 Munich Germany 

出 版 物:《PATTERN RECOGNITION LETTERS》 (Pattern Recogn. Lett.)

年 卷 期:2014年第37卷第1期

页      面:15-23页

核心收录:

学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:Karl-Steinbuch grant German Federal Ministry of Education and Research (BMBF) [PKB-01GS08, 0315894A] Deutsche Forschungsgemeinschaft (DFG) [SCHM 2966/1-1, SFB 1074] 

主  题:Partially supervised learning Transductive learning Semi-supervised learning Classification Microarray data 

摘      要:In this study we focus on the effects of sample limitations on partially supervised learning algorithms. We analyze the performance of these types of learning algorithms on small datasets under varying trade-offs between labeled and unlabeled samples. In contrast to the typical settings for partially supervised learning algorithms, the number of available unlabeled samples is also restricted. We utilize gene expression datasets, which are typical examples of data collections of small sample size. DNA microarrays are used to generate these profiles by measuring thousands of mRNA values simultaneously. These profiles are increasingly used for tumor categorization. Partially labeled microarray datasets occur naturally in the diagnostic setting if the corresponding labeling process is time consuming or expensive (i.e., early relapse vs. late relapse). Surprisingly, the best classification results in our study were not always achieved for a maximal proportion of labeled samples. This is unexpected as asymptotical results for an unlimited amount of samples suggest that a labeled sample is of an exponentially higher value than an unlabeled one. Our analysis shows that in the case of finite sample sizes a more balanced trade-off between labeled and unlabeled samples is optimal. This trade-off was not unique over all experiments. It could be shown that the optimal trade-off between unlabeled and labeled samples is mainly dependent on the chosen learning algorithm. (C) 2013 Elsevier B.V. All rights reserved.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分