To overcome the drawback that boosting decision trees perform fast speed in the test time while the training process is relatively too slow to meet the requirements of applications with real-time learning, we propose ...
详细信息
ISBN:
(纸本)9781628419009
To overcome the drawback that boosting decision trees perform fast speed in the test time while the training process is relatively too slow to meet the requirements of applications with real-time learning, we propose a fast decision trees training method by pruning those noneffective features in advance. And basing on this method, we also design a fast boosting decision trees training algorithm. Firstly, we analyze the structure of each decision trees node, and prove that the classification error of each node has a bound through derivation. Then, by using the error boundary to prune non-effective features in the early stage, we greatly accelerate the decision tree training process, and would not affect the training results at all. Finally, the decision tree accelerated training method is integrated into the general boosting process forming a fast boosting decision trees training algorithm. This algorithm is not a new variant of boosting, on the contrary, it should be used in conjunction with existing boosting algorithms to achieve more training acceleration. To test the algorithm's speedup performance and performance combined with other accelerated algorithms, the original AdaBoost and two typical acceleration algorithms LazyBoost and StochasticBoost were respectively used in conjunction with this algorithm into three fast versions, and their classification performance was tested by using the Lsis face database which contained 12788 images. Experimental results reveal that this fast algorithm can achieve more than double training speedup without affecting the results of the trained classifier, and can be combined with other acceleration algorithms.
The goal of this study was to identify and describe the extent to which a comprehensive set of risk factors from the ecological model are associated with physical intimate partner violence (IPV) victimization in Mexic...
详细信息
The goal of this study was to identify and describe the extent to which a comprehensive set of risk factors from the ecological model are associated with physical intimate partner violence (IPV) victimization in Mexico. To achieve this goal, a structured additive probit model is applied to a dataset of 35,000 observations and 42 theoretical correlates from 10 data sources. Due to the model's high dimensionality, the boosting algorithm is used for estimating and simultaneously performing variable selection and model choice. The findings indicate that age at sexual initiation and marriage, sexual and professional autonomy, social connectedness, household overcrowding, housework division, women's political participation, and geographical space are associated with physical IPV. The findings provide evidence of risk factors that were previously unknown in Mexico or were solely based on theoretical grounds without empirical testing. Specifically, this paper makes three key contributions. First, by examining the individual and relationship levels, it was possible to identify high-risk population subgroups that are often overlooked, such as women who experienced sexual initiation during childhood and women living in overcrowded families. Second, the inclusion of community factors enabled the identification of the importance of promoting women's political participation. Finally, the introduction of several emerging indicators allowed to examine the experiences faced by women in various aspects of life, such as decision-making power, social networks, and the division of housework.
The paper describes a method for vehicle recognition using a generic shape model and boosting neural network classifiers. The generic shape model, which is able to represent different vehicle classes, is derived by pr...
详细信息
ISBN:
(纸本)1424403316
The paper describes a method for vehicle recognition using a generic shape model and boosting neural network classifiers. The generic shape model, which is able to represent different vehicle classes, is derived by principal component analysis on a set of training shapes recovered automatically from 2D image sequences. The pose parameters and the shape parameters of the model are estimated by fitting the model to the vehicle in each image using Genetic algorithm, which are used to classify the vehicle. In order to improve the recognition accuracy and speed, we develop adaptive boosting neural network classifiers for vehicle recognition, it is shown that our approach is more accuracy and faster than existing methods.
This paper presents a reconfigurable architecture of a classification module based on the Adaboost algorithm. This architecture is used for object detection based on the attributes of color and texture. The Adaboost a...
详细信息
ISBN:
(纸本)9780769545639
This paper presents a reconfigurable architecture of a classification module based on the Adaboost algorithm. This architecture is used for object detection based on the attributes of color and texture. The Adaboost algorithm module uses the technique of decision trees as weak classifiers. This high-performance architecture processes up to 325 dense images of size 640 x 480 pixels, classifying all the structured objects contained on the image. Classification results are provided on an image with the same size. Both architectures, Adaboost algorithm and decision trees, are discussed and compared with several studies found in the literature. The conclusions and perspectives of the project are provided at the end of this document.
Implicit Neural Representation (INR) has emerged as an effective method for unsupervised image denoising. However, INR models are typically overparameterized;consequently, these models are prone to overfitting during ...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Implicit Neural Representation (INR) has emerged as an effective method for unsupervised image denoising. However, INR models are typically overparameterized;consequently, these models are prone to overfitting during learning, resulting in suboptimal results, even noisy ones. To tackle this problem, we propose a general recipe for regularizing INR models in image denoising. In detail, we propose to iteratively substitute the supervision signal with the mean value derived from both the prediction and supervision signal during the learning process. We theoretically prove that such a simple iterative substitute can gradually enhance the signal-to-noise ratio of the supervision signal, thereby benefiting INR models during the learning process. Our experimental results demonstrate that INR models can be effectively regularized by the proposed approach, relieving overfitting and boosting image denoising performance.
Although there is no strict consensus, some studies have reported that Postictal generalized EEG suppression (PGES) is a potential electroencephalographic (EEG) biomarker for risk of sudden unexpected death in epileps...
详细信息
Although there is no strict consensus, some studies have reported that Postictal generalized EEG suppression (PGES) is a potential electroencephalographic (EEG) biomarker for risk of sudden unexpected death in epilepsy (SUDEP). PGES is an epoch of EEG inactivity after a seizure, and the detection of PGES in clinical data is extremely difficult due to artifacts from breathing, movement and muscle activity that can adversely affect the quality of the recorded EEG data. Even clinical experts visually interpreting the EEG will have diverse opinions on the start and end of PGES for a given patient. The development of an automated EEG suppression detection tool can assist clinical personnel in the review and annotation of seizure files, and can also provide a standard for quantifying PGES in large patient cohorts, possibly leading to further clarification of the role of PGES as a biomarker of SUDEP risk. In this paper, we develop an automated system that can detect the start and end of PGES using frequency domain features in combination with boosting classification algorithms. The average power for different frequency ranges of EEG signals are extracted from the prefiltered recorded signal using the fast fourier transform and are used as the feature set for the classification algorithm. The underlying classifiers for the boosting algorithm are linear classifiers using a logistic regression model. The tool is developed using 12 seizures annotated by an expert then tested and evaluated on another 20 seizures that were annotated by 11 experts.
We present a comparative study on the most popular machine learning methods applied to the challenging problem of customer churning prediction in the telecommunications industry. In the first phase of our experiments,...
详细信息
We present a comparative study on the most popular machine learning methods applied to the challenging problem of customer churning prediction in the telecommunications industry. In the first phase of our experiments, all models were applied and evaluated using cross-validation on a popular, public domain dataset. In the second phase, the performance improvement offered by boosting was studied. In order to determine the most efficient parameter combinations we performed a series of Monte Carlo simulations for each method and for a wide range of parameters. Our results demonstrate clear superiority of the boosted versions of the models against the plain (non-boosted) versions. The best overall classifier was the SVM-POLY using AdaBoost with accuracy of almost 97% and F-measure over 84%. (C) 2015 Elsevier B.V. All rights reserved.
The problem of "Learning to rank" is a popular research topic in Information Retrieval (IR) and machine learning communities. Some existing list-wise methods, such as AdaRank, directly use the IR measures as...
详细信息
The problem of "Learning to rank" is a popular research topic in Information Retrieval (IR) and machine learning communities. Some existing list-wise methods, such as AdaRank, directly use the IR measures as performance functions to quantify how well a ranking function can predict rankings. However. the IR measures only count for the document ranks, but do not consider how well the algorithm predicts the relevance scores of documents. These methods do not make best use of the available prior knowledge and may lead to suboptimal performance. Hence, we conduct research by combining both the document ranks and relevance scores. We propose a novel performance function that encodes the relevance scores. We also define performance functions by combining our proposed one with MAP or NDCG, respectively. The experimental results on the benchmark data collections show that our methods can significantly outperform the state-of-the-art AdaRank baselines. (C) 2010 Elsevier B.V. All rights reserved.
Most recommender systems have too many items to propose to too many users based on limited information. This problem is formally known as the sparsity of the ratings' matrix,because this is the structure that hold...
详细信息
Most recommender systems have too many items to propose to too many users based on limited information. This problem is formally known as the sparsity of the ratings' matrix,because this is the structure that holds user preferences. This paper outlines a Collaborative Filtering Recommender System that tries to amend this situation. After applying Singular Value Decomposition to reduce the dimensionality of the data, our system makes use of a dynamic Artificial Neural Network architecture with boosted learning to predict user ratings. Furthermore we use the concept of k-separability to deal with the resulting noisy data, a methodology not yet tested in Recommender Systems. The combination of these techniques applied to the MovieLens datasets seems to yield promising results.
In real datasets, most are unbalanced. Data imbalance can be defined as the number of instances in some classes greatly exceeds the number of instances in other classes. Whether in the field of data mining or machine ...
详细信息
In real datasets, most are unbalanced. Data imbalance can be defined as the number of instances in some classes greatly exceeds the number of instances in other classes. Whether in the field of data mining or machine learning, data imbalance can have adverse effects. At present, the methods to solve the problem of data imbalance can be divided into data-level methods, algorithm-level methods and hybrid methods. In this paper, we propose a weighted hybrid ensemble method for classifying imbalanced data in binary classification tasks, called WHMBoost. In the framework of the boosting algorithm, the presented method combines two data sampling methods and two base classifiers, and each sampling method and each base classifier is assigned corresponding weights, which makes them have better complementary advantages. The performance of WHMBoost has been evaluated on 40 benchmark imbalanced datasets with state of the art ensemble methods like AdaBoost, RUSBoost, SMOTEBoost using AUC, F-Measure and Geometric Mean as the performance evaluation criteria. Experimental results show significant improvement over the other methods and it can be concluded that WHMBoost is a promising and effective algorithm to deal with imbalance datasets. (C) 2020 Elsevier B.V. All rights reserved.
暂无评论