Ant Colony Algorithms has been successfully applied to solve combinatorial optimization problems. Subsequently applications on data Mining (DM) appeared, more specifically aiming to solve classification problems. the ...
详细信息
this paper presents a paralellization of the incremental algorithm inc-k-msn, for mixed data and similarity functions that do not satisfy metric properties. the algorithm presented is suitable for processing large dat...
详细信息
the proceedings contain 116 papers. the topics discussed include: support function machines;group decision making with triangular fuzzy linguistic variables;advanced forecasting and classification technique for condit...
详细信息
ISBN:
(纸本)9783540772255
the proceedings contain 116 papers. the topics discussed include: support function machines;group decision making with triangular fuzzy linguistic variables;advanced forecasting and classification technique for condition monitoring of rotating machinery;a new recurring multistage evolutionary algorithm for solving problems efficiently;exploration of a text collection and identification of topics by clustering;fuzzy ridge regression with non symmetric membership functions and quadratic models;load forecasting with support vector machines and semi-parametric method;support kernel machine-based active learning to find labels and a proper kernel simultaneously;knowledge extraction from unstructured surface meshes;the outer impartation information content of rules and rule sets;an engineering approach to data mining projects;and a framework to analyze biclustering results on microarray experiments.
this paper presents a multidisciplinary study on the application of statistical and neural models for analysing data oil immissions of atmospheric Pollution ill urban areas. data was collected from the network of poll...
详细信息
ISBN:
(纸本)9783642043932
this paper presents a multidisciplinary study on the application of statistical and neural models for analysing data oil immissions of atmospheric Pollution ill urban areas. data was collected from the network of pollution measurement stations in the Spanish Autonomous Region of Castile-Leon. Four Pollution parameters and a Pollution measurement station in the city of Burgos were used to carry Out the Study in 2007, during a period Of just over six months. Pollution data are compared, their values are interrelated and relationships are established not only withthe Pollution variables, but also with different weeks of the year. the aim of this Study is to classify the levels of atmospheric Pollution in relation to the days of the week, trying to differentiate between working days and non-working days.
Analysis of data without labels is commonly subject to scrutiny by unsupervised machine learning techniques. Such techniques provide more meaningful representations, useful for better understanding of a problem at han...
详细信息
ISBN:
(纸本)9783642043932
Analysis of data without labels is commonly subject to scrutiny by unsupervised machine learning techniques. Such techniques provide more meaningful representations, useful for better understanding of a problem at hand, than by looking only at the data itself. Although abundant expert knowledge exists in many areas where unlabelled data is examined, such knowledge is rarely incorporated into automatic analysis. Incorporation of expert, knowledge is frequently a matter of combining multiple data sources front disparate hypothetical spaces. In cases where such spaces belong to different, data types, this task becomes even more challenging. In this paper we present a novel immune-inspired method that enables the fusion of such disparate types of data. for a specific set of problems. We show that our method provides a better visual understanding of only hypothetical space withthe help of data from another hypothetical space. We believe that, our model has implications for the field of exploratory data analysis and knowledge discovery.
We review a new form of self-organizing map which is based on a nonlinear projection of latent points into data space, identical to that performed in the Generative Topographic Mapping (GTM).(1) But whereas the GTM is...
详细信息
We review a new form of self-organizing map which is based on a nonlinear projection of latent points into data space, identical to that performed in the Generative Topographic Mapping (GTM).(1) But whereas the GTM is an extension of a mixture of experts, this model is an extension of a product of experts. 2 We show visualisation and clustering results on a data set composed of video data of lips uttering 5 Korean vowels. Finally we note that we may dispense withthe probabilistic underpinnings of the product of experts and derive the same algorithm as a minimisation of mean squared error between the prototypes and the data. this leads us to suggest a new algorithm which incorporates local and global information in the clustering. Both ot the new algorithms achieve better results than the standard Self-Organizing Map.
We present a machine learning approach to discover the agent dynamics that drives the evolution of the social groups in a community. We set up the problem by introducing an agent-based hidden Markov model for the agen...
详细信息
We present a machine learning approach to discover the agent dynamics that drives the evolution of the social groups in a community. We set up the problem by introducing an agent-based hidden Markov model for the agent dynamics: an agent's actions are determined by micro-laws. Nonetheless, We learn the agent dynamics from the observed communications without knowing state transitions. Our approach is to identify the appropriate micro-laws corresponding to an identification of the appropriate parameters in the model. the model identification problem is then formulated as a mixed optimization problem. To solve the problem, we develop a multistage learning process for determining the group structure, the group evolution, and the micro-laws of a community based on the observed set of communications among actors, without knowing the semantic contents. Finally, to test the quality of our approximations and the feasibility of the approach, we present the results of extensive experiments on synthetic data as well as the results on real communities, such as Enron email and Movie newsgroups. Insight into agent dynamics helps us understand the driving forces behind social evolution.
the k-nearest neighbor method is a classifier based on the evaluation of the distances to each pattern in the training set. the edited version of this method consists of the application of this classifier with a subse...
详细信息
the k-nearest neighbor method is a classifier based on the evaluation of the distances to each pattern in the training set. the edited version of this method consists of the application of this classifier with a subset of the complete training set in which some of the training patterns are excluded, in order to reduce the classification error rate. In recent works, genetic algorithms have been successfully applied to determine which patterns must be included in the edited subset. In this paper we propose a novel implementation of a genetic algorithm for designing edited k-nearest neighbor classifiers. It includes the definition of a novel mean square error based fitness function, a novel clustered crossover technique, and the proposal of a fast smart mutation scheme. In order to evaluate the performance of the proposed method, results using the breast cancer database, the diabetes database and the letter recognition database from the UCI machine learning benchmark repository have been included. Both error rate and computational cost have been considered in the analysis. Obtained results show the improvement achieved by the proposed editing method.
the importance of modularity in product innovation is analyzed in this paper. through simulations with an agent-based modular economic model, we examine the significance of the use of a modular structure in new produc...
详细信息
ISBN:
(纸本)9783540772255
the importance of modularity in product innovation is analyzed in this paper. through simulations with an agent-based modular economic model, we examine the significance of the use of a modular structure in new product designs in terms of its impacts upon customer satisfaction and firms' competitiveness. To achieve the above purpose, the automatically defined terminal is proposed and is used to modify the simple genetic programming.
We investigate a committee-based approach for active learning of real-valued functions. this is a variance-only strategy for selection of informative training data. As such it is shown to suffer when the model class i...
详细信息
ISBN:
(纸本)9783540772255
We investigate a committee-based approach for active learning of real-valued functions. this is a variance-only strategy for selection of informative training data. As such it is shown to suffer when the model class is misspecified since the learner's bias is high. Conversely, the strategy outperforms passive selection when the model class is very expressive since active minimization of the variance avoids overfitting.
暂无评论