During their synthesis, a large fraction of proteins are directed to the secretory pathway. There are several models that aim to distinguish between different destinations along this pathway;however, they rarely disti...
详细信息
During their synthesis, a large fraction of proteins are directed to the secretory pathway. There are several models that aim to distinguish between different destinations along this pathway;however, they rarely distinguish between known stages of this translocation Process. This paper presents a translocation probability function which models the protein SRP-recruitment process-the firststage of the secretory pathway. It unifies groups of proteins with distinct final destinations, allowing more specific sorting to be done in due course, mirroring the hierarchical nature of secretory translocation. We apply conditional random fields to evaluate the prediction accuracy of a full sequence model. Introducing the translocation function improves Substantially compared to a model based on properties that are relevant to the subsequent stages and final destinations only. For the discrimination of secretory, signal peptide (SP)-equipped proteins and non-secretory proteins a correlation coefficient of 0.98 is achieved-a level of performance that is only met by specialized SP predictors. Transmembrane proteins cause considerable confusion in signal peptide predictors, but fit naturally into our transparent design and reduce the performance of the translocation function only slightly. The proposed function and model assist efforts to uncover localization and function for the growing numbers of protein sequence data. Applying our model we estimate with high confidence that about 27% of the human and 29% of the mouse proteins are associated with the secretory pathway. (C) 2008 Elsevier Ltd. All rights reserved.
At present, there are not the methods of learning dynamic Bayesian network structure from no time symmetry data. In this paper, a method of learning dynamic Bayesian network structure from non-time symmetric data is d...
详细信息
ISBN:
(纸本)9781424441990
At present, there are not the methods of learning dynamic Bayesian network structure from no time symmetry data. In this paper, a method of learning dynamic Bayesian network structure from non-time symmetric data is developed by dint of transfer variables. In this method, first transfer variables between two adjacent time slices are learned by combining star structure and Gibbs sampling. Then dynamic Bayesian network part structure can be built based on sorting nodes and local search & scoring method. A complete dynamic Bayesian network structure can be obtained by extending along time series.
Feature select ion is an important problem in the fields of machinelearning and pat tern recognition. datastream data classification with high dimensional and sparse, and the dimension of the need for compression, f...
详细信息
ISBN:
(纸本)9781424441990
Feature select ion is an important problem in the fields of machinelearning and pat tern recognition. datastream data classification with high dimensional and sparse, and the dimension of the need for compression, feature selection methods suitable for datastream classification study of very value of this area is currently a lack of in-depth study. This paper summarizes the current data flow classification feature selection research, analysis of the characteristics of different methods. Based on the principle of maximum entropy, naive Bayes with the technology on the datastream tuple feature selection attributes, divided into two different subsets of the merits, so as to enhance the work of C4.5 classifier results, the experiment proved not only strearnMEFS classification of time-saving, but also to improve the quality of the classification.
AdaBoost has been the representation of ensemble learning algorithm because of its excellent performance. However, due to its longtime training, AdaBoost was complained about by people and this defect limits the pract...
详细信息
ISBN:
(纸本)9781424441990
AdaBoost has been the representation of ensemble learning algorithm because of its excellent performance. However, due to its longtime training, AdaBoost was complained about by people and this defect limits the practical application. Bagging is a rapid method of training and supports for parallel computing. One of important factors that can affect the performance of ensemble learning is the diversity of component learners. Based on this view, a new algorithm using clustering and Boosting to prune Bagging ensembles is proposed in this paper. Its learning efficiency is close to Bagging and its performance is close to AdaBoost. Furthermore, this new algorithm can detect noisy data from original samples based on cascade technique, and a better result of noise detection can be acquired.
One of the unresolved problems faced in the construction of intelligent tutoring systems is the acquisition of background knowledge, either for the specification of the teaching strategy, or for the construction of th...
详细信息
ISBN:
(纸本)9780615306292
One of the unresolved problems faced in the construction of intelligent tutoring systems is the acquisition of background knowledge, either for the specification of the teaching strategy, or for the construction of the student model, identifying the deviations of students' behavior. In this paper, we argue that the use of sequential patternmining and constraint relaxations can be used to automatically acquire that knowledge. We show that the methodology of constrained patternmining used can solve this problem in a way that is difficult to achieve with other approaches.
Extracting loosely structured data records (DRs) has wide applications in many domains, such as forum patternrecognition, blog data analysis, and books and news review analysis. Currently existing methods work well f...
详细信息
ISBN:
(纸本)9781424418367
Extracting loosely structured data records (DRs) has wide applications in many domains, such as forum patternrecognition, blog data analysis, and books and news review analysis. Currently existing methods work well for strongly structured DRs only. In this paper, we address the problem of extracting loosely structured DRs through miningstrict patterns. In our method, we utilize both content feature and tag tree feature to recognize the loosely structured DRs, and propose a new approach to extract the DRs automatically. Through experimental study we demonstrate that this method is both effective and robust in practice.
The proceedings contain 145 papers. The topics discussed include: advancing knowledge discovery and datamining;knowledge management in the ubiquitous software development;a novel network intrusion detection system (N...
详细信息
ISBN:
(纸本)0769530907
The proceedings contain 145 papers. The topics discussed include: advancing knowledge discovery and datamining;knowledge management in the ubiquitous software development;a novel network intrusion detection system (NIDS) based on signatures search of datamining;mining high utility itemsets in large high dimensional data;effective pruning strategies for sequential patternmining;cooperation forensic computing research;grasping related words of unknown word for automatic extension of lexical dictionary;a novel website structure optimization model for more effective web navigation;average fuzzy direction based handwritten Chinese characters recognition approach;the datamining technology based on CIMS and its application on automotive remanufacturing;an empirical study on improving the manufacturing informatization index system of China;and centrality research on the traditional Chinese medicine network.
The proceedings contain 14 papers. The topics discussed include: descriptive analysis of image data: basic models;media analysis and the algorithm ontology;descriptive approach to medical image analysis- substantiatio...
ISBN:
(纸本)9789898111258
The proceedings contain 14 papers. The topics discussed include: descriptive analysis of image data: basic models;media analysis and the algorithm ontology;descriptive approach to medical image analysis- substantiation and interpretation;shape modeling for the analysis of heart deformation patterns;fast multi-view evaluation of data represented by symmetric clusters;search algorithm and the distortion analysis of fine details of real images;a proposal for automatic inference of pressure ulcers grade based on wound images and patient data;an image mining medical warehouse;geo-located image categorization and location recognition;pearling: stroke segmentation with crusted pearl strings;automatic target retrieval in a video surveillance task;and learning probabilistic models for recognizing faces under pose variations.
Image mining with patternrecognition methods has been widely used to understand the image knowledge. The general rule for the number of features is that the number can not be too much to improve the efficiency and sp...
详细信息
ISBN:
(纸本)9780769531182
Image mining with patternrecognition methods has been widely used to understand the image knowledge. The general rule for the number of features is that the number can not be too much to improve the efficiency and speed of mining. This paper presents the wavelet methods for feature compression, and gives the evaluation of the wavelet methods comparing with data driven feature selection using the example of the project of image mining of pathogen yeast Cryptococcus Neoformans. The experiments show that the wavelet methods for feature compression are almost as effective as data driven feature selection to identify variance pathogen condition. The experiments are built on the training images set and evaluated using the new test set images with machinelearning tool WEKA.
Recent studies have shown that the Gibbs density function can model complex patterns and that a constrained maximum entropy formulation affords a powerful means of estimating its parameters from pattern class data. Th...
详细信息
ISBN:
(纸本)9781424433216
Recent studies have shown that the Gibbs density function can model complex patterns and that a constrained maximum entropy formulation affords a powerful means of estimating its parameters from pattern class data. The theory, developed in the context of learning a prior model of natural images, has been applied successfully to the synthesis of textures and shapes, and to pattern classification. The basic parameter estimation algorithm rests on gradient algorithm following the maximization under constraints of an entropy criterion. The purpose of this study is to investigate a Gibbsian Kohonen neural network, a Kohonen network which can learn these constrained maximum entropy Gibbs density parameters for pattern representation and classification. Experiments in classification of handwritten characters verify the validity and efficiency of the method.
暂无评论