genetic programming (GP) has been successfully applied to classification. However, GP may evolve biased classifiers when encountering the problem of class imbalance. These biased classifiers are often not reliable to ...
详细信息
genetic programming (GP) has been successfully applied to classification. However, GP may evolve biased classifiers when encountering the problem of class imbalance. These biased classifiers are often not reliable to be applied to some real-world applications. High dimensionality makes it more difficult for classifiers to effectively separate the majority class and the minority class. The use of GP to handle the joint effect of high dimensionality and class imbalance has not been heavily investigated. In this paper, we propose a GP approach to high-dimensional imbalanced classification, with the goals of increasing the classification performance as well as saving training time. To achieve this goal, a new fitness function is developed to solve the problem of class imbalance, and moreover, a strategy is proposed to reuse previous good GP individuals for improving efficiency. The proposed method is examined on ten high-dimensional imbalanced datasets. Experimental results show that, for high-dimensional imbalanced classification, the proposed method generally outperforms other GP methods and traditional classification algorithms using sampling methods to solve the problem of class imbalance.
Image classification is a popular task in machine learning and computer vision, but it is very challenging due to high variation crossing images. Using ensemble methods for solving image classification can achieve hig...
详细信息
Image classification is a popular task in machine learning and computer vision, but it is very challenging due to high variation crossing images. Using ensemble methods for solving image classification can achieve higher classification performance than using a single classification algorithm. However, to obtain a good ensemble, the component (base) classifiers in an ensemble should be accurate and diverse. To solve image classification effectively, feature extraction is necessary to transform raw pixels into high-level informative features. However, this process often requires domain knowledge. This article proposes an evolutionary approach based on genetic programming to automatically and simultaneously learn informative features and evolve effective ensembles for image classification. The new approach takes raw images as inputs and returns predictions of class labels based on the evolved classifiers. To achieve this, a new individual representation, a new function set, and a new terminal set are developed to allow the new approach to effectively find the best solution. More important, the solutions of the new approach can extract informative features from raw images and can automatically address the diversity issue of the ensembles. In addition, the new approach can automatically select and optimize the parameters for the classification algorithms in the ensemble. The performance of the new approach is examined on 13 different image classification datasets of varying difficulty and compared with a large number of effective methods. The results show that the new approach achieves better classification accuracy on most datasets than the competitive methods. Further analysis demonstrates that the new approach can evolve solutions with high accuracy and diversity.
Because malicious intrusions into critical information infrastructures are essential to the success of cyberterrorists, effective intrusion detection is also essential for defending such infrastructures. Cyberterroris...
详细信息
Because malicious intrusions into critical information infrastructures are essential to the success of cyberterrorists, effective intrusion detection is also essential for defending such infrastructures. Cyberterrorism thrives on the development of new technologies;and, in response, intrusion detection methods must be robust and adaptive, as well as efficient. We hypothesize that genetic programming algorithms can aid in this endeavor. To investigate this proposition, we conducted an experiment using a very large dataset from the 1999 Knowledge Discovery in Database (KDD) Cup data, supplied by the Defense Advanced Research Projects Agency (DARPA) and MIT's Lincoln Laboratories. Using machine-coded linear genomes and a homologous crossover operator in genetic programming, promising results were achieved in detecting malicious intrusions. The resulting programs execute in real time, and high levels of accuracy were realized in identifying both positive and negative instances. (C) 2006 Elsevier B.V. All rights reserved.
The development of quantitative structure-retention relationships (QSRR) aims at constructing an appropriate linear/nonlinear model for the prediction of the retention behavior (such as Kovats retention index) of a so...
详细信息
The development of quantitative structure-retention relationships (QSRR) aims at constructing an appropriate linear/nonlinear model for the prediction of the retention behavior (such as Kovats retention index) of a solute on a chromatographic column. Commonly, multi-linear regression and artificial neural networks are used in the QSRR development in the gas chromatography (GC). In this study, an artificial intelligence based data-driven modeling formalism, namely genetic programming (GP), has been introduced for the development of quantitative structure based models predicting Kovats retention indices (KRI). The novelty of the GP formalism is that given an example dataset, it searches and optimizes both the form (structure) and the parameters of an appropriate linear/nonlinear data-fitting model. Thus, it is not necessary to pre-specify the form of the data-fitting model in the GP-based modeling. These models are also less complex, simple to understand, and easy to deploy. The effectiveness of GP in constructing QSRRs has been demonstrated by developing models predicting KRIs of light hydrocarbons (case study-I) and adamantane derivatives (case study-II). In each case study, two-, three- and four-descriptor models have been developed using the KRI data available in the literature. The results of these studies clearly indicate that the GP-based models possess an excellent KRI prediction accuracy and generalization capability. Specifically, the best performing four-descriptor models in both the case studies have yielded high (>0.9) values of the coefficient of determination (R-2) and low values of root mean squared error (RMSE) and mean absolute percent error (MAPE) for training, test and validation set data. The characteristic feature of this study is that it introduces a practical and an effective GP-based method for developing QSRRs in gas chromatography that can be gainfully utilized for developing other types of data-driven models in chromatography science. (C) 2
In this work we present an extensive bibliometric and content-based analysis of the scientific literature about genetic programming in the twenty-first century. Our work has two key peculiarities. First, we revealed t...
详细信息
In this work we present an extensive bibliometric and content-based analysis of the scientific literature about genetic programming in the twenty-first century. Our work has two key peculiarities. First, we revealed the topics emerging from the literature based on an unsupervised analysis of the textual content of titles and abstracts. Second, we executed all of our analyses twice, once on the papers published in the venues that are typical of the evolutionary computation research community and once on those published in all the other venues. This view from "both sides of the fence" allows us to gain broader and deeper insights into the actual contributions of our community.
Various machine learning techniques exist to perform regression on temporal data with concept drift occurring. However, there are numerous nonstationary environments where these techniques may fail to either track or ...
详细信息
Various machine learning techniques exist to perform regression on temporal data with concept drift occurring. However, there are numerous nonstationary environments where these techniques may fail to either track or detect the changes. This study develops a genetic programming-based predictive model for temporal data with a numerical target that tracks changes in a dataset due to concept drift. When an environmental change is evident, the proposed algorithm reacts to the change by clustering the data and then inducing nonlinear models that describe generated clusters. Nonlinear models become terminal nodes of genetic programming model trees. Experiments were carried out using seven nonstationary datasets and the obtained results suggest that the proposed model yields high adaptation rates and accuracy to several types of concept drifts. Future work will consider strengthening the adaptation to concept drift and the fast implementation of genetic programming on GPUs to provide fast learning for high-speed temporal data.
The journal and in particular the resource reviews have been running for 20 years. We summarise the GP literature, including top papers and authors, as seen by users of the genetic programming bibliography. Then revis...
详细信息
The journal and in particular the resource reviews have been running for 20 years. We summarise the GP literature, including top papers and authors, as seen by users of the genetic programming bibliography. Then revisit our original goals for GPEM book reviews and compare them with what has achieved.
The uncertain capacitated arc routing problem has many real-world applications in logistics domains. genetic programming (GP) is a promising approach to training routing policies to make real-time decisions and handle...
详细信息
The uncertain capacitated arc routing problem has many real-world applications in logistics domains. genetic programming (GP) is a promising approach to training routing policies to make real-time decisions and handle uncertain events effectively. In the real world, there are various problem domains and no single routing policy can work effectively in all of them. Instead of training in isolation, we can leverage the relatedness between the problems and transfer knowledge from previously solved source problems to solve the target problem. The existing transfer methods are not effective enough due to the loss of diversity during the knowledge transfer. To increase the diversity of the transferred knowledge, in this article, we propose a novel GP method that removes phenotypic duplicates from the source individuals to initialize the target individuals. Furthermore, assuming that the transferred knowledge used in initialization already includes all the important knowledge explored for the source problem, it is more effective to explore new regions that have not been explored for the source problem. Therefore, we propose novel genetic operators that prohibit the search from revisiting the source individuals when solving the target problem. To speed up the revisit check, we propose to adapt a powerful hashing method for routing policies that greatly improves the efficiency of the genetic operators. Our experimental results show that the proposed method significantly outperforms the existing GP approaches with knowledge transfer in terms of both initial and final solution quality.
作者:
Cheng, Zhi-LiangZhou, Wan-HuanGarg, AnkitUniv Macau
Dept Civil & Environm Engn State Key Lab Internet Things Smart City Taipa Macao Peoples R China Shantou Univ
Guangdong Engn Ctr Struct Safety & Hlth Monitorin Dept Civil & Environm Engn Shantou Peoples R China
Soil suction, an important parameter in the safety and risk assessment of geotechnical and green infrastructures, is greatly affected by plants and weather in the shallow soil layers of urban landscapes/green infrastr...
详细信息
Soil suction, an important parameter in the safety and risk assessment of geotechnical and green infrastructures, is greatly affected by plants and weather in the shallow soil layers of urban landscapes/green infrastructure. In this study, a computational model consisting of a drying-cycle model and wetting-cycle model was developed by means of a genetic programming method to depict variations in soil suction using select influential parameters. The input data in the model development were measured in a field monitoring test on the campus of the University of Macau. Soil suction was quantified by field monitoring at different distances (0.5 m, 1.5 m, and 3.0 m) from a tree, at a constant depth of 20 cm, with selected influential parameters including initial soil suction, air humidity, rainfall amount, cycle duration, and ratio of distance from tree to tree canopy. Based on the performance analysis, the efficiency and reliability of the proposed computational model are validated. The importance of each input and the coupled effect of each two input variables on the output were investigated using global sensitivity analysis. It can be concluded that the proposed computational model based on the artificial intelligence simulation method describes the relationship between field soil suction in drying-wetting cycles and select input variables within an acceptable degree of error. Accordingly, it can serve as a tool for supporting geotechnical construction design and for assessing the safety and risk of geotechnical green infrastructures.
Image acquisition, segmentation, object detection and tracking are essential parts of surveillance systems. Usually, image filtering approaches are employed as preprocessing step to reduce the effect of motion or out-...
详细信息
Image acquisition, segmentation, object detection and tracking are essential parts of surveillance systems. Usually, image filtering approaches are employed as preprocessing step to reduce the effect of motion or out-of-focus blur problem. In this paper, we propose genetic programming (GP) based blind-image deconvolution filter. A GP based numerical expression is developed for image restoration which optimally combines and exploits dependencies among features of the blurred image. In order to develop such function, first, a set of feature vectors is formed by considering a small neighborhood around each pixel. At second stage, the estimator is trained and developed through GP process that automatically selects and combines the useful feature information under a fitness criterion. The developed function is then applied to estimate the image pixel intensity of the degraded images. The performance of filter function is estimated using various degraded image sequences. Our comparative analysis highlight the effectiveness of GP based proposed filter. (c) 2012 Elsevier Ltd. All rights reserved.
暂无评论