The overall file transmission time between peer machines is affected by network conditions, file dispatching and retrieval systems, error detection and the processing capabilities of the peers. Improving the network c...
详细信息
ISBN:
(纸本)9788955191387
The overall file transmission time between peer machines is affected by network conditions, file dispatching and retrieval systems, error detection and the processing capabilities of the peers. Improving the network condition or the peers' processing capabilities for speedup will incur higher cost. We introduce a wrapper algorithm for the dispatching and retrieval systems at the peers. The algorithm performs file compression-decompression, file split-merge and file chunk scheduling processes to speedup the file transfer time of large files. The wrapper algorithm is wrapped around RSYNC and the results reveal an overall reduction of file transfer time of up to 66%.
Feature selection (FS) may improve the performance, cost-efficiency, and understandability of supervised machine learning models. In this paper, FS for the recently introduced distance-based supervised machine learnin...
详细信息
Feature selection (FS) may improve the performance, cost-efficiency, and understandability of supervised machine learning models. In this paper, FS for the recently introduced distance-based supervised machine learning model is considered for regression problems. The study is contextualized by first pro-viding an umbrella review (review of reviews) of recent development in the research field. We then pro-pose a saliency-based one-shot wrapper algorithm for FS, which is called MAS-FS. The algorithm is compared with a set of other popular FS algorithms, using a versatile set of simulated and benchmark datasets. Finally, experimental results underline the usefulness of FS for regression, confirming the utility of certain filter algorithms and particularly the proposed wrapper algorithm.(c) 2022 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http:// ***/licenses/by/4.0/).
Additive model plays an important role in machine learning due to its flexibility and interpretability in the prediction function. However, solving large-scale additive models is a challenging task due to several diff...
详细信息
Additive model plays an important role in machine learning due to its flexibility and interpretability in the prediction function. However, solving large-scale additive models is a challenging task due to several difficulties. Until now, scaling up additive models is still an open problem. To address this challenging problem, in this paper, we propose a new doubly stochastic optimization algorithm for solving the generalized additive models (DSGAM). We first propose a generalized formulation of additive models without the orthogonal hypothesis on the basis function. After that, we propose a wrapper algorithm to optimize the generalized additive models. Importantly, we introduce a doubly stochastic gradient algorithm (DSG) to solve an inner subproblem in the wrapper algorithm, which can scale well in sample size and dimensionality simultaneously. Finally, we prove the fast convergence rate of our DSGAM algorithm. The experimental results on various large-scale benchmark datasets not only confirm the fast convergence of our DSGAM algorithm, but also show a huge reduction of computational time compared with existing algorithms, while retaining the similar generalization performance.
Traffic classification is currently a significant challenge for network monitoring and management. Feature selection is an effective method to realize dimension reduction and decrease redundant information. To realize...
详细信息
Traffic classification is currently a significant challenge for network monitoring and management. Feature selection is an effective method to realize dimension reduction and decrease redundant information. To realize accurate traffic classification at lower price of evaluations, a hybrid feature subset selection method is proposed on the base of sliding block, the size of which is flexible according to the classification performance. Furthermore, an incremental strategy of convergence is designed on the base of hybrid feature subset selection methods. The strategy gathers all the features that have been selected. To discover the value of relationship among all the selected features, an extra round of selection is added on the base of the original algorithm. The performances are examined by three groups of experiments. Our theoretical analysis and experimental observations reveal that the proposed method consumes fewer evaluations with similar or even better classification performance at different initialized size of block. Moreover, the incremental strategy of convergence makes a further improvement on the classification accuracy.
A great amount of data is being created these days, which is kept in massive datasets with different irrelevant attributes that are unrelated to the goal notion. Feature selection deals with the selection of the most ...
详细信息
A great amount of data is being created these days, which is kept in massive datasets with different irrelevant attributes that are unrelated to the goal notion. Feature selection deals with the selection of the most pertinent features that also aid to increase the classification accuracy. The topic of feature selection is viewed as a multiobjective optimization problem with two goals: improving the classification accuracy and reducing the number of features used. Drone Squadron Optimization (DSO) is one of the most recent artifact-inspired optimization algorithms;having two key components: semi-autonomous drones that hover over a terrain and a command center that manages the drones. In this paper, two binary variants of the DSO are proposed to deal with the feature selection problem. The proposed binary algorithms are applied on 21 different benchmark datasets with five state-of-the-art algorithms, i.e., Grey Wolf Optimizer (GWO), Particle Swarm Optimization (PSO), Flower Pollination algorithm (FPA), Genetic algorithm (GA) and Ant Lion Optimization (ALO). Different assessment indicators are used to assess the diversification and intensification of the optimization algorithms. When compared to current state-of-the-art wrapper-based algorithms, the suggested binary techniques are more efficient in scanning the dimension space and picking the most useful characteristics for categorization tasks, resulting in the lowest classification error rate.
Background: High-throughput bio-OMIC technologies are producing high-dimension data from bio-samples at an ever increasing rate, whereas the training sample number in a traditional experiment remains small due to vari...
详细信息
Background: High-throughput bio-OMIC technologies are producing high-dimension data from bio-samples at an ever increasing rate, whereas the training sample number in a traditional experiment remains small due to various difficulties. This "large p, small n" paradigm in the area of biomedical "big data" may be at least partly solved by feature selection algorithms, which select only features significantly associated with phenotypes. Feature selection is an NP-hard problem. Due to the exponentially increased time requirement for finding the globally optimal solution, all the existing feature selection algorithms employ heuristic rules to find locally optimal solutions, and their solutions achieve different performances on different datasets. Results: This work describes a feature selection algorithm based on a recently published correlation measurement, Maximal Information Coefficient (MIC). The proposed algorithm, McTwo, aims to select features associated with phenotypes, independently of each other, and achieving high classification performance of the nearest neighbor algorithm. Based on the comparative study of 17 datasets, McTwo performs about as well as or better than existing algorithms, with significantly reduced numbers of selected features. The features selected by McTwo also appear to have particular biomedical relevance to the phenotypes from the literature. Conclusion: McTwo selects a feature subset with very good classification performance, as well as a small feature number. So McTwo may represent a complementary feature selection algorithm for the high-dimensional biomedical datasets.
With the rapid development of microarray technology and interdisciplinary science, it is possible for microarray technology to be used to predict diseases. Microarray technology has the advantages of high speed, high ...
详细信息
ISBN:
(纸本)9781728132488
With the rapid development of microarray technology and interdisciplinary science, it is possible for microarray technology to be used to predict diseases. Microarray technology has the advantages of high speed, high efficiency and reliability in disease prediction. However, microarray data are usually high-dimensional with small samples, additionally, the samples are often imbalanced, which brings a lot of difficulties to researchers. In view of the above problems, it is proposed in this paper a Filter-wrapper hybrid feature selection algorithm Union Information Gini Cost-sensitive Feature Selection General Vector Machine (UIG-CFGVM) to tackle the high-dimensional imbalanced small-sample problem. The improved hybrid algorithm is as follows: Firstly, the most common features are removed by the proposed hybrid filter algorithm UIG, which is obtained by Information Gain (Info)and Gini Index (Gini). Secondly, Cost-sensitive Feature selection General Vector Machine (CFGVM) is used as wrapper method to further improve the performance of the algorithm. The experimental results show that the proposed algorithm UIG-CFGVM has better classification performance in seven biomedical high-dimensional imbalanced small-sample datasets compared with other similar algorithms.
Feature selection is an important task in data mining, which aims to reduce the dimensionality of the data sets while at least maintaining the classification performance. Chicken swarm optimization algorithm (CSO) has...
详细信息
ISBN:
(纸本)9781538619315
Feature selection is an important task in data mining, which aims to reduce the dimensionality of the data sets while at least maintaining the classification performance. Chicken swarm optimization algorithm (CSO) has been widely applied to feature selection because of its efficiency and effectiveness. However, since feature selection is a challenging task with a complex search space, CSO quickly gets stuck the local minimum problem. This paper aims to improve the CSO searching ability by applying logistic and tend chaotic maps to assist the CSO swarm in exploring the search space better. The proposed chaotic chicken swarm algorithm (CCSO)-based feature selection algorithm is compared with four feature selection algorithms on five benchmark data sets. A comparison among several types of popular classifiers is done to figure out the sensitivity of each classifier corresponding to the selected features and the dimension reduction. During iterations, the best fitness value shows remarkable improvement of the classification accuracy.
暂无评论