Imbalanced problems are quite pervasive in many real-world applications. In imbalanced distributions, a class or some classes of data, called minority class(es), is/are under-represented compared to other classes. Thi...
详细信息
Imbalanced problems are quite pervasive in many real-world applications. In imbalanced distributions, a class or some classes of data, called minority class(es), is/are under-represented compared to other classes. This skewness in the data underlying distribution causes many difficulties for typical machine learning algorithms. The notion becomes even more complicated when machine learning algorithms are to combat multi-class imbalanced problems. The presented solutions for tackling the issues arising from imbalanced distributions, generally fall into two main categories: data-oriented methods and model-based algorithms. Focusing on the latter, this paper suggests an elegant blend of boosting and over-sampling paradigms, which is called MDOBoost, to bring considerable benefits to the learning ability of multi-class imbalanced data sets. The over-sampling technique introduced and adopted in this paper, Mahalanobis distance-based over-sampling technique (MDO in short), is delicately incorporated into boosting algorithm. In fact, the minority classes are over-sampled via MDO technique in such a way that they almost preserve the original minority class characteristics. MDO, in comparison with the popular method in this field, SMOTE, generates more similar minority class examples to original class samples. Moreover, the broader representation of minority class examples is provided via MDO, and this, in turn, causes the classifier to build larger decision regions. MDOBoost increases the generalization ability of a classifier, since it indicates better results with pruned version of C4.5 classifier;unlike other over-sampling/boosting procedures, which have difficulties with pruned version of C4.5. MDOBoost is applied to real-world multi-class imbalanced benchmarks and its performance is then compared with several data-level and model-based algorithms. The empirical results and theoretical analyses reveal that MDOBoost offers superior advantages compared to popular class de
We describe a new boosting algorithm that is the first such algorithm to be both smooth and adaptive. These two features make possible performance improvements for many learning tasks whose solutions use a boosting te...
详细信息
We describe a new boosting algorithm that is the first such algorithm to be both smooth and adaptive. These two features make possible performance improvements for many learning tasks whose solutions use a boosting technique. The boosting approach was originally suggested for the standard PAC model;we analyze possible applications of boosting in the context of agnostic learning, which is more realistic than the PAC model. We derive a lower bound for the final error achievable by boosting in the agnostic model and show that our algorithm actually achieves that accuracy (within a constant factor). We note that the idea of applying boosting in the agnostic model was first suggested by Ben-David, Long and Mansour (2001) and the solution they give is improved in the present paper. The accuracy we achieve is exponentially better with respect to the standard agnostic accuracy parameter beta. We also describe the construction of a boosting "tandem" whose asymptotic number of iterations is the lowest possible (in both gamma and epsilon) and whose smoothness is optimal in terms of O((.)). This allows adaptively solving problems whose solution is based on smooth boosting (like noise tolerant boosting and DNF membership learning), while preserving the original (non-adaptive) solution's complexity.
Image classification is of great importance for digital photograph management. In this paper we propose a general statistical learning method based on boosting algorithm to perform image classification for photograph ...
详细信息
Image classification is of great importance for digital photograph management. In this paper we propose a general statistical learning method based on boosting algorithm to perform image classification for photograph annotation and management. The proposed method employs both features extracted from image content (i.e., color moment and edge direction histogram) and features from the EXIT metadata recorded by digital cameras. To fully utilize potential feature correlations and improve the classification accuracy, feature combination is needed. We incorporate linear discriminant analysis (LDA) algorithm to implement linear combinations between selected features and generate new combined features. The combined features are used along with the original features in boosting algorithm for improving classification performance. To make the proposed learning algorithm more efficient, we present two heuristics for selective feature combinations, which can significantly reduce training computation without losing performance. The proposed image classification method has several advantages: small model size, computational efficiency and improved classification performance based on LDA feature combination. (c) 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
Traditional classification algorithms are difficult in dealing with imbalance data. This paper proposes a classification algorithm called CascadeBoost, which combines with the advantages of boosting algorithm and casc...
详细信息
ISBN:
(纸本)9781467317139
Traditional classification algorithms are difficult in dealing with imbalance data. This paper proposes a classification algorithm called CascadeBoost, which combines with the advantages of boosting algorithm and cascade model that can learn imbalance data. Cascade model allows the pre-training data to be balanced by gradually reducing the number of the major class;and then the most rich information samples based on the weight distribution can be gradually selected using boosting algorithm. The experimental results show that the proposed method obtains better performance compared to other methods.
Rising Carbon Dioxide (CO2) levels from human activities are driving climate change. Carbon capture and storage (CCS) during enhanced oil recovery (EOR) in underground reservoirs offer both environmental and economic ...
详细信息
Rising Carbon Dioxide (CO2) levels from human activities are driving climate change. Carbon capture and storage (CCS) during enhanced oil recovery (EOR) in underground reservoirs offer both environmental and economic benefits. This method boosts oil production, cuts greenhouse gas emissions, and supports sustainable energy. Precise well placement in CO2-EOR is a crucial task for effective oil displacement, but traditional reservoir simulators are costly. This study explores and compares boosting algorithms, as fast surrogate models, to achieve accurate well placement during CO2-EOR in light oil carbonate reservoirs. The research considers various reservoir scenarios with different geological heterogeneity levels (i.e., homogeneous, moderately het-erogeneous, and highly heterogeneous reservoirs). Various parameters, such as injection and production well locations, the distance between production and injection wells in an inverted five-spot pattern, pattern angle, and injection and production rates are explored using a compositional reservoir simulator to assess their impact on the well placement problem. A comprehensive analysis of various boosting algorithms, including AdaBoost, CatBoost, Gradient boosting, LightGBM, and XGBoost is performed using the simulated dataset to assess their efficacy. The results demonstrate that LightGBM outperformed the other algorithms with the lowest Mean Ab-solute Error and Root Mean Square Error of 115.3 x 106 $ and 188.2 x 106 $, respectively. Additionally, it demonstrates exceptional speed, averaging 3 to 8 times faster than other boosting algorithms in the three reservoir scenarios. This superior performance coupled with its efficient runtime makes LightGBM the ideal choice for the study objectives. Moreover, the mass balance approach highlights the significant CO2 storage efficiency, emphasizing the effectiveness of CO2-EOR in storing CO2 in underground heterogeneous reservoirs.
This paper presents a strategy to improve the AdaBoost algorithm with a quadratic combination of base classifiers. We observe that learning this combination is necessary to get better performance and is possible by co...
详细信息
This paper presents a strategy to improve the AdaBoost algorithm with a quadratic combination of base classifiers. We observe that learning this combination is necessary to get better performance and is possible by constructing an intermediate learner operating on the combined linear and quadratic terms. This is not trivial, as the parameters of the base classifiers are not under direct control, obstructing the application of direct optimization. We propose a new method realizing iterative optimization indirectly. First we train a classifier by randomizing the labels of training examples. Subsequently, the input learner is called repeatedly with a systematic update of the labels of the training examples in each round. We show that the quadratic boosting algorithm converges under the condition that the given base learner minimizes the empirical error. We also give an upper bound on the VC-dimension of the new classifier. Our experimental results on 23 standard problems show that quadratic boosting compares favorably with AdaBoost on large data sets at the cost of training speed. The classification time of the two algorithms, however, is equivalent. (C) 2007 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
In high-dimensional setting, componentwise L(2)boosting has been used to construct sparse model that performs well, but it tends to select many ineffective variables. Several sparse boosting methods, such as, Sparse L...
详细信息
In high-dimensional setting, componentwise L(2)boosting has been used to construct sparse model that performs well, but it tends to select many ineffective variables. Several sparse boosting methods, such as, Sparse L(2)boosting and Twin boosting, have been proposed to improve the variable selection of L(2)boosting algorithm. In this article, we propose a new general sparse boosting method (GSboosting). The relations are established between GSboosting and other well known regularized variable selection methods in the orthogonal linear model, such as adaptive Lasso, hard thresholds, etc. Simulation results show that GSboosting has good performance in both prediction and variable selection.
A revised support vector regression (SVR) ensemble model based on boosting algorithm (SVR-boosting) is presented in this paper for electricity price forecasting in electric power market. In the light of characteristic...
详细信息
A revised support vector regression (SVR) ensemble model based on boosting algorithm (SVR-boosting) is presented in this paper for electricity price forecasting in electric power market. In the light of characteristics of electricity price sequence, a new triangular-shaped 为oss function is constructed in the training of the forecasting model to inhibit the learning from abnormal data in electricity price sequence. The results from actual data indicate that, compared with the single support vector regression model, the proposed SVR-boosting ensemble model is able to enhance the stability of the model output remarkably, acquire higher predicting accuracy, and possess comparatively satisfactory generalization capability.
Support vector machine (SVM) has been widely applied in flood forecasting models and achieved good results. However, it has been plagued by two problems. One is its over-reliance on the number and quality of the raw i...
详细信息
ISBN:
(纸本)9781509006229
Support vector machine (SVM) has been widely applied in flood forecasting models and achieved good results. However, it has been plagued by two problems. One is its over-reliance on the number and quality of the raw input data;the other is that a single model cannot describe the complex relationships hidden in the flood evolution processes. To tackle the two problems, this paper presents an SVM flood forecasting model based on kernel principal component analysis(KPCA) and boosting algorithm. The nonlinear KPCA is applied to extract the useful information from historical flood data. To eliminate the interference caused by redundant information, boosting learning algorithm with multiple SVM models exploits various distribution characteristics of the historical flood data. Each sub-model focuses on the learning of a certain type of samples. Finally the prediction is achieved through combining multiple models. Application to the flood forecasting of Wangjiaba station at Huaihe River shows that the proposed SVM ensemble model based on KPCA and boosting learning can improve the flood forecasting accuracy effectively.
The main telecom operator goal is to build end user loyalty towards offered services. Computing the perceived quality, known, Quality of Experience (QoE) has become a crucial topic for investigation. Machine learning ...
详细信息
ISBN:
(纸本)9781538617342
The main telecom operator goal is to build end user loyalty towards offered services. Computing the perceived quality, known, Quality of Experience (QoE) has become a crucial topic for investigation. Machine learning algorithms provide a solution to tease out the complex relationships between several influencing factors and QoE. This paper proposes a novel QoE estimation model for video service, namely, boosting Support Vector Regression (BSVR) based QoE model. The purpose of this model is to investigate the effectiveness of combining multiple learners instead of classical individual learner, in order to improve prediction accuracy of the QoE. The BSVR is based on a combination of two principal techniques: boosting algorithm and Support Vector Regression (SVR). More precisely, multiple SVR models were trained in an iterative boosting algorithm to create a powerful predictive model. In fact, the use of SVRs as weak learners has several advantages. First, the SVR is based on a convex optimization problem, where a global optimal solution exploits a limited number of support vectors, which results in improved prediction accuracy, while maintaining low computational complexity. Second, each SVR uses flexible Radial Basis Function (RBF) kernel function to model QoE data efficiently. Comparative evaluation of our proposed BSVR-based QoE model is performed to show its superiority over relevant ensemble learning methods and regression models based on single learner, in terms of prediction accuracy and computational complexity
暂无评论