Boththe number and complexity of data Mining projects has increased in late years. Unfortunately, nowadays there isn't a formal process model for this kind of projects, or existing approaches are not right or com...
详细信息
ISBN:
(纸本)9783540772255
Boththe number and complexity of data Mining projects has increased in late years. Unfortunately, nowadays there isn't a formal process model for this kind of projects, or existing approaches are not right or complete enough. In some sense, present situation is comparable to that in software that led to 'software crisis' in latest 60's. Software engineering matured based on process models and methodologies. data Mining's evolution is being parallel to that in Software engineering. the research work described in this paper proposes a Process Model for data Mining Projects based on the study of current Software engineering Process Models (IEEE Std 1074 and ISO 12207) and the most used data Mining Methodology CRISP-DM (considered as a "facto" standard) as basic references.
We review a new form of self-organizing map which is based on a nonlinear projection of latent points into data space, identical to that performed in the Generative Topographic Mapping (GTM) [1]. But whereas the GTM i...
详细信息
ISBN:
(纸本)9783540772255
We review a new form of self-organizing map which is based on a nonlinear projection of latent points into data space, identical to that performed in the Generative Topographic Mapping (GTM) [1]. But whereas the GTM is an extension of a mixture of experts, this model is an extension of a product of experts [6]. We show visualisation and clustering results on a data set composed of video data of lips uttering 5 Korean vowels and show that the new mapping achieves better results than the standard Self-Organizing Map.
A unified approach is proposed for sparse kernel data modelling that includes regression and classification as well as probability density function estimation. the orthogonal-least-squares forward selection method bas...
详细信息
ISBN:
(纸本)9783540772255
A unified approach is proposed for sparse kernel data modelling that includes regression and classification as well as probability density function estimation. the orthogonal-least-squares forward selection method based on the leave-one-out test criteria is presented within this unified data-modelling framework to construct sparse kernel models that generalise well. Examples from regression, classification and density estimation applications are used to illustrate the effectiveness of this generic sparse kernel data modelling approach.
In this paper a new type of interface agent will be presented. this agent is oriented to model systems for human based computation. this kind of computation, that we consider a logical extension of intelligent agent p...
详细信息
ISBN:
(纸本)9783540772255
In this paper a new type of interface agent will be presented. this agent is oriented to model systems for human based computation. this kind of computation, that we consider a logical extension of intelligent agent paradigm, emerges as valid approach for the resolution of complex problems. Firstly an study of the state of the art of interface agents will be review. Next, human based computation will be defined and we will see how is necessary to extend the current typology of interface agents to model this new kind of computation. In addition, a new type of interface agent, oriented to model this type of computational system, will be presented. Finally, two of the most representative applications of human based computation will be specified using this new typology.
Classification of micro-array data has been studied extensively but only a small amount of research work has been done on classification of micro-array data involving more than two classes. this paper proposes a learn...
详细信息
ISBN:
(纸本)9783540772255
Classification of micro-array data has been studied extensively but only a small amount of research work has been done on classification of micro-array data involving more than two classes. this paper proposes a learning strategy that deals with building a multi-target classifier and takes advantage from well known data mining techniques. To address the intrinsic difficulty of selecting features in order to promote the classification accuracy, the paper considers the use of a set of binary classifiers each of ones is devoted to predict a single class of the multi-classification problem. these classifiers are similar to local experts whose knowledge (about the features that are most correlated to each class value) is taken into account by the learning strategy for selecting an optimal set of features. Results of the experiments performed on a publicly available dataset demonstrate the feasibility of the proposed approach.
this paper proposes a novel model of support function machine (SFM) for time series predictions. Two machine learning models, namely, Support vector machines (SVM) and procedural neural networks (PNN) are compared in ...
详细信息
ISBN:
(纸本)9783540772255
this paper proposes a novel model of support function machine (SFM) for time series predictions. Two machine learning models, namely, Support vector machines (SVM) and procedural neural networks (PNN) are compared in solving time series and they inspire the creation of SFM. SFM aims to extend the support vectors to spatiotemporal domain, in which each component of vectors is a function with respect to time. In the view of the function, SFM transfers a vector function of time to a static vector. Similar to the SVM training procedure, the corresponding learning algorithm for SFM is presented, which is equivalent to solving a quadratic programming. Moreover, two practical examples are investigated and the experimental results illustrate the feasibility of SFM in modeling time series predictions.
the power of social values that helps to shape or formulate our behavior patterns is not only inevitable, but also how we have surreptitiously responded to the hidden curriculum that derives from such social values in...
详细信息
ISBN:
(纸本)9783540772255
the power of social values that helps to shape or formulate our behavior patterns is not only inevitable, but also how we have surreptitiously responded to the hidden curriculum that derives from such social values in our decision making can be just as significant. through a machine learning approach, we are able to discover the agent dynamics that drives the evolution of the social groups in a community. By doing so, we set up the problem by introducing an agent-based hidden Markov model, in which the acts of an agent are determined by micro-laws with unknown parameters. To solve the problem, we develop a multistage learning process for determining the micro-laws of a community based on observed set of communications between actors without the semantic contents. We present the results of extensive experiments on synthetic data as well as some results on real communities, e.g., Enron email and movie newsgroups.
Recently researchers have introduced methods to develop reusable knowledge in reinforcement learning (RL). In this paper, we define simple principles to combine skills in reinforcement learning. We present a skill com...
详细信息
ISBN:
(纸本)9783540772255
Recently researchers have introduced methods to develop reusable knowledge in reinforcement learning (RL). In this paper, we define simple principles to combine skills in reinforcement learning. We present a skill combination method that uses trained skills to solve different tasks in a RL domain. through this combination method, composite skills can be used to express tasks at a high level and they can also be re-used with different tasks in the context of the same problem domains. the method generates an abstract task representation based upon normal reinforcement learning which decreases the information coupling of states thus improving an agent's learning. the experimental results demonstrate that the skills combination method call effectively reduce the learning space, and so accelerate the learning speed of the RL agent. We also show in the examples that different tasks can be solved by combining simple reusable skills.
DNA sequence is an important determinant of the positioning, stability, and activity of nucleosome, yet the molecular basis of these remains elusive. Positioned nucleosomes are believed to play an important role in tr...
详细信息
ISBN:
(纸本)9783540772255
DNA sequence is an important determinant of the positioning, stability, and activity of nucleosome, yet the molecular basis of these remains elusive. Positioned nucleosomes are believed to play an important role in transcriptional regulation and for the organization of chromatin in cell nuclei. After completing the genome project of many organisms, sequence mining received considerable and increasing attention. Many works devoted a lot of effort to detect the periodicity in DNA sequences, namely, the DNA segments that wrap the Histone protein. In this paper, we describe and apply a dynamic periodicity detection algorithm to discover periodicity in DNA sequences. Our algorithm is based on suffix tree as the underlying data structure. the proposed approach considers the periodicity of alternative substrings, in addition to considering dynamic window to detect the periodicity of certain instances of substrings. We demonstrate the applicability and effectiveness of the proposed approach by reporting test results on three data sets.
作者:
Al-Shahib, AliGilbert, DavidBreitling, RainerUniv Birmingham
Dept Elect Elect & Comp Engn Biomed Informat Signals & Syst Res Lab Birmingham W Midlands England Univ Glasgow
BioInformat Res Ctr Dept Comp Sci Glasgow G12 8QQ Lanark Scotland Univ Groningen
Groningen Biomolecular Sci & Biotechnol Inst Groningen Bioinformat Ctr Haren Netherlands
Much work has been done to identify species-specific proteins in sequenced genomes and hence to determine their function. We assumed that such proteins have specific physico-chemical properties that will discriminate ...
详细信息
ISBN:
(纸本)9783540772255
Much work has been done to identify species-specific proteins in sequenced genomes and hence to determine their function. We assumed that such proteins have specific physico-chemical properties that will discriminate them from proteins in other species. In this paper, we examine the validity of this assumption by comparing proteins and their properties from different bacterial species using Support Vector Machines (SVM). We show that by training on selected protein sequence properties, SVMs can successfully discriminate between proteins of different species. this finding takes us a step closer to inferring the functional characteristics of these proteins.
暂无评论