this paper presents a system for classifying e-mails into two categories, legitimate and fraudulent. this classifier system is based on the serial application of three filters: a Bayesian filter that classifies the te...
详细信息
ISBN:
(纸本)9783540772255
this paper presents a system for classifying e-mails into two categories, legitimate and fraudulent. this classifier system is based on the serial application of three filters: a Bayesian filter that classifies the textual content of e-mails, a rule- based filter that classifies the non grammatical content of e-mails and, finally, a filter based on an emulator of fictitious accesses which classifies the responses from websites referenced by links contained in e-mails. this system is based on an approach that is hybrid, because it uses different classification methods, and also integrated, because it takes into account all kind of data and information contained in e-mails. this approach aims to provide an effective and efficient classification. the system first applies fast and reliable classification methods, and only when the resulting classification decision is imprecise does the system apply more complex analysis and classification methods.
Previous clustering ensemble algorithms usually use a consensus function to obtain a final partition from the outputs of the initial clustering. In this paper, we propose a new clustering ensemble method, which genera...
详细信息
ISBN:
(纸本)9783540772255
Previous clustering ensemble algorithms usually use a consensus function to obtain a final partition from the outputs of the initial clustering. In this paper, we propose a new clustering ensemble method, which generates a new feature space from initial clustering outputs. Multiple runs of an initial clustering algorithm like k-means generate a new feature space, which is significantly better than pure or normalized feature space. therefore, running a simple clustering algorithm on generated feature space can obtain the final partition significantly better than pure data. In this method, we use a modification of k-means for initial clustering runs named as "intelligent k-means", which is especially defined for clustering ensembles. the results of the proposed method are presented using both simple k-means and intelligent k-means. Fast convergence and appropriate behavior are the most interesting points of the proposed method. Experimental results on real data sets show effectiveness of the proposed method.
Training multilayer neural networks is typically carried out using gradient descent techniques. Ever since the brilliant backpropagation (BP), the first gradient-based algorithm proposed by Rumelhart et al., novel tra...
详细信息
ISBN:
(纸本)9783540772255
Training multilayer neural networks is typically carried out using gradient descent techniques. Ever since the brilliant backpropagation (BP), the first gradient-based algorithm proposed by Rumelhart et al., novel training algorithms have appeared to become better several facets of the learning process for feed-forward neural networks. learning speed is one of these. In this paper, a learning algorithm that. applies linear-least-squares is presented. We offer the theoretical basis for the method and its performance is illustrated by its application to several examples in which it is compared with other learning algorithms and well known data sets. Results show that the new algorithm upgrades the learning speed of several backpropagation algorithms, while preserving good optimization accuracy. Due to its performance and low computational cost it is an interesting alternative, even for second order methods, particularly when dealing large networks and training sets.
Preference learning has recently received a lot of attention within the machine learning field, concretely learning by pairwise comparisons is a well-established technique in this field. We focus on the problem of lea...
详细信息
ISBN:
(纸本)9783540772255
Preference learning has recently received a lot of attention within the machine learning field, concretely learning by pairwise comparisons is a well-established technique in this field. We focus on the problem of learningthe overall preference weights of a set of alternatives from the (possibly conflicting) uncertain and imprecise information liven by a group of experts into the form of interval pairwise comparison matrices. Because of the complexity of real world problems, incomplete information or knowledge and different patterns of the experts, interval data provide a flexible framework to account uncertainty and imprecision. In this context, we propose a two-stage method in a distance-based framework, where the impact of the data certainty degree is captured. First, it is obtained the group preference matrix that best reflects imprecise information given by the experts. then, the crisp preference weights and the associated ranking of the alternatives are derived from the obtained group matrix. the proposed methodology is made operational by using an Interval Goal Programming formulation.
the successful application of machine learning techniques to industrial problems places various demands on the collaborators. the system designers must possess appropriate analytical skills and technical expertise, an...
详细信息
ISBN:
(纸本)9783540772255
the successful application of machine learning techniques to industrial problems places various demands on the collaborators. the system designers must possess appropriate analytical skills and technical expertise, and the management of the industrial or commercial partner must be sufficiently convinced of the potential benefits that they are prepared to invest in money and equipment. Vitally, the collaboration also requires a significant investment in time from the end-users in order to provide training data from which the system can (hopefully) learn. this poses a problem if the developed Machine learning system is not sufficiently accurate, as the users and management;may view their input as wasted effort, and lose faith withthe process. In this paper we investigate techniques for making early predictions of the error rate achievable after further interactions. In particular we show how decomposing the error in different components can lead to useful predictors of achievable accuracy, but;that this is dependent on the choice of an appropriate sampling methodology.
the Video-on-Demand (VoD) application is popular as the videos are delivered to the users at anytime and anywhere. the system load is increased due to simultaneous access of VoD system by many users. the proposed arch...
详细信息
ISBN:
(纸本)9783540772255
the Video-on-Demand (VoD) application is popular as the videos are delivered to the users at anytime and anywhere. the system load is increased due to simultaneous access of VoD system by many users. the proposed architecture discusses load balancing mechanism within a video server and to provide service to the users with small start-up delay. the Video storage is based on the probability of the users requesting for the video. Videos with higher request probability are stored and replicated to ensure guaranteed retrieval. Parity generation scheme is employed to provide reliability to non-popular videos. the system is also capable of handling disk failures transparently and thereby providing a reliable service to the user.
the performance of many learning methods are usually influenced by the class imbalance problem, where the training data, is dominated by the instances belonging to one class. In this paper, we propose a novel method w...
详细信息
ISBN:
(纸本)9783540772255
the performance of many learning methods are usually influenced by the class imbalance problem, where the training data, is dominated by the instances belonging to one class. In this paper, we propose a novel method which combines random forest based techniques and sampling methods for effectively learning from imbalanced data. Our method is mainly composed of two phases: data cleaning and classification based on random forest. Firstly, the training data is cleaned through the elimination of dangerous negative instances. the data cleaning process is supervised by a negative biased random forest, where the negative instances have a, major proportion of the training data in each of the tree in the forest. Secondly, we develop a, variant of random forest in which each tree is biased towards the positive class to classify the data set, where a major vote is provided for prediction. In the experimental test, we compared our method with other existing methods on the real data sets, and the results demonstrate the significative performance improvement of our method in terms of the area under the ROC curve(AUC).
this paper investigates the application of novelty detection techniques to the problem of drug profiling in forensic science. Numerous one-class classifiers are tried out, from the simple k-means to the more elaborate...
详细信息
ISBN:
(纸本)9783540772255
this paper investigates the application of novelty detection techniques to the problem of drug profiling in forensic science. Numerous one-class classifiers are tried out, from the simple k-means to the more elaborate Support Vector data Description algorithm. the target application is the classification of illicit drugs samples as part of an existing trafficking network or as a new cluster. A unique chemical database of heroin and cocaine seizures is available and allows assessing the methods. Evaluation is done using the area under the ROC curve of the classifiers. Gaussian mixture models and the SVDD method are trained both with and without outlier examples, and it is found that providing outliers during training improves in some cases the classification performance. Finally, combination schemes of classifiers are also tried out. Results highlight methods that, may guide the profiling methodology used in forensic analysis.
Due to the non-stationarity of EEG signals, online training and adaptation is essential to EEG based brain-computer interface (BCI) systems. Asynchronous BCI offers more natural human-machine interaction, but it is a ...
详细信息
ISBN:
(纸本)9783540772255
Due to the non-stationarity of EEG signals, online training and adaptation is essential to EEG based brain-computer interface (BCI) systems. Asynchronous BCI offers more natural human-machine interaction, but it is a great challenge to train and adapt an asynchronous BCI online because the user's control intention and timing are usually unknown. this paper proposes a novel motor imagery based asynchronous BCI for controlling a simulated robot in a specifically designed environment which is able to provide user's control intention and timing during online experiments, so that online training and adaptation of motor imagery based asynchronous BCI can be effectively investigated. this paper also proposes an online training method, attempting to automate the process of finding the optimal parameter values of the BCI system to deal with non-stationary EEG signals. Experimental results have shown that the proposed method for online training of asynchronous BCI significantly improves the performance.
In this paper, the effect of possible missing data on wind power estimation is examined. One-month wind speed data obtained from wind and solar observation station which is constructed at Iki Ey- Campus of Anadolu Uni...
详细信息
ISBN:
(纸本)9783540772255
In this paper, the effect of possible missing data on wind power estimation is examined. One-month wind speed data obtained from wind and solar observation station which is constructed at Iki Ey- Campus of Anadolu University is used. A closed correlation is found between consecutive wind speed datathat are collected for a period of 15 second. A very short time wind speed forecasting model is built by using two-input and one-output Adaptive Neuro Fuzzy Inference System (ANFIS). First, some randomly selected data from whole data are discarded. Second, 10%, 20% and 30% of all data which are randomly selected from a predefined interval (3-6 m/sec) are discarded and discarded data, are forecasted. Finally, the data are fitted to Weibull distribution, Weibull distribution parameters are obtained and wind powers are estimated for all cases. the results show that the missing data has a significant effect on wind power estimation and must be taken into account in wind studies. Furthermore, it is concluded that ANFIS is a convenient tool for this kind of prediction.
暂无评论