版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Honda Automobile Technol Res Southfield MI USA Wayne State Univ Dept Comp Sci Detroit MI 48202 USA
出 版 物:《KNOWLEDGE AND INFORMATION SYSTEMS》 (知识和信息系统季刊)
年 卷 期:2016年第48卷第1期
页 面:201-228页
核心收录:
学科分类:0711[理学-系统科学] 07[理学] 08[工学] 070105[理学-运筹学与控制论] 081101[工学-控制理论与控制工程] 0701[理学-数学] 071101[理学-系统理论] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:National Cancer Institute of the National institutes of Health [R21CA175974] US National Science Foundation [IIS-1231742, IIS-1242304, IIS-1527827] Direct For Computer & Info Scie & Enginr Div Of Information & Intelligent Systems Funding Source: National Science Foundation Div Of Information & Intelligent Systems Direct For Computer & Info Scie & Enginr Funding Source: National Science Foundation
主 题:Rare class Transfer learning Class imbalance AdaBoost Weighted majority algorithm HealthCare informatics Text mining
摘 要:A fundamental problem in data mining is to effectively build robust classifiers in the presence of skewed data distributions. Class imbalance classifiers are trained specifically for skewed distribution datasets. Existing methods assume an ample supply of training examples as a fundamental prerequisite for constructing an effective classifier. However, when sufficient data are not readily available, the development of a representative classification algorithm becomes even more difficult due to the unequal distribution between classes. We provide a unified framework that will potentially take advantage of auxiliary data using a transfer learning mechanism and simultaneously build a robust classifier to tackle this imbalance issue in the presence of few training samples in a particular target domain of interest. Transfer learning methods use auxiliary data to augment learning when training examples are not sufficient and in this paper we will develop a method that is optimized to simultaneously augment the training data and induce balance into skewed datasets. We propose a novel boosting-based instance transfer classifier with a label-dependent update mechanism that simultaneously compensates for class imbalance and incorporates samples from an auxiliary domain to improve classification. We provide theoretical and empirical validation of our method and apply to healthcare and text classification applications.