Escherichia coli (E. coli) K12 was sequenced in 1997. The 4,639,221-base pair DNA sequence consists of 4288 annotated protein-coding genes, 38 percent of which have no attrib- uted function. One of the major problems ...
详细信息
Escherichia coli (E. coli) K12 was sequenced in 1997. The 4,639,221-base pair DNA sequence consists of 4288 annotated protein-coding genes, 38 percent of which have no attrib- uted function. One of the major problems in predicting prokaryotic promoters is locating the spacers between the -35 box and -10 box and between the -10 box and transcription start site. In this paper, we use the adopted expectationmaximization (EM) algorithm to accurately find the localizations of the promoter regions. A brand new purine-pyrimidine encoding method is pro- posed to reduce the dimensions of the training data. The heavy demand on systems for both computation and memory space can then be avoided through the choice of coding factor. The most representative features are used for training learning vector quantization networks. The simulation results of the proposed coding approach reveal that the precision of promoter predic- tion using the proposed approach is approximately the same as the precision using the traditional encoding method.
With the continuous improvement of the complexity and comprehensive level of the system, its reliability becomes more and more important. The remaining useful life (RUL) estimation method using the degradation model w...
详细信息
With the continuous improvement of the complexity and comprehensive level of the system, its reliability becomes more and more important. The remaining useful life (RUL) estimation method using the degradation model with random effect to describe the degradation process of the system has been widely used such as Wiener process. However, the conventional Wiener-process-based degradation model only considers the current monitoring data but not the historical degradation data, which leads to the inaccuracy of RUL prediction. Furthermore, in engineering, there will always be data missing caused by sensor networks, long life cycle properties of system and so on, leading to unsatisfactory results. This paper contributed a RUL re-prediction method based on Wiener process combining the current monitoring status and historical degradation data of the system. In the initial prediction process, the Wiener process is used to describe the degradation process of the system, the drift coefficient and diffusion coefficient are estimated by expectation maximization algorithm (EM algorithm), and the dynamic Bayesian networks (DBNs) model for system performance degradation is established to solve the uncertainty caused by missing data. In the re-prediction process, n groups of performance degradation monitoring data and historical predicted data are combined to calculate the basic degradation in each stage of Wiener process, and the DBNs are used for modeling. The RUL value is obtained by the time difference between the detection point and the predicted fault point, it is determined by the failure threshold finally. A case of subsea Christmas tree system is adopted to demonstrate the proposed approach.
Background: Data from metabolomic studies are typically complex and high-dimensional. Principal component analysis (PCA) is currently the most widely used statistical technique for analyzing metabolomic data. However,...
详细信息
Background: Data from metabolomic studies are typically complex and high-dimensional. Principal component analysis (PCA) is currently the most widely used statistical technique for analyzing metabolomic data. However, PCA is limited by the fact that it is not based on a statistical model. Results: Here, probabilistic principal component analysis (PPCA) which addresses some of the limitations of PCA, is reviewed and extended. A novel extension of PPCA, called probabilistic principal component and covariates analysis (PPCCA), is introduced which provides a flexible approach to jointly model metabolomic data and additional covariate information. The use of a mixture of PPCA models for discovering the number of inherent groups in metabolomic data is demonstrated. The jackknife technique is employed to construct confidence intervals for estimated model parameters throughout. The optimal number of principal components is determined through the use of the Bayesian Information Criterion model selection tool, which is modified to address the high dimensionality of the data. Conclusions: The methods presented are illustrated through an application to metabolomic data sets. Jointly modeling metabolomic data and covariates was successfully achieved and has the potential to provide deeper insight to the underlying data structure. Examination of confidence intervals for the model parameters, such as loadings, allows for principled and clear interpretation of the underlying data structure. A software package called MetabolAnalyze, freely available through the R statistical software, has been developed to facilitate implementation of the presented methods in the metabolomics field.
This paper is concerned with identification of nonlinear systems with multiple and correlated scheduling variables. Multiple auto regressive exogenous (ARX) models are identified on different process operating conditi...
详细信息
This paper is concerned with identification of nonlinear systems with multiple and correlated scheduling variables. Multiple auto regressive exogenous (ARX) models are identified on different process operating conditions, and a normalized exponential function as the probability density function associated with each of the local ARX models taking effect is then used to combine all the local models to represent the complete dynamics of a nonlinear system. The parameters of the local ARX models and the exponential functions are estimated simultaneously under the framework of the expectationmaximization (EM) algorithm. A numerical example is applied to demonstrate the proposed identification method.
In this paper we consider de-interleaving a finite number of stochastic parametric sources. The sources are modeled as independent autoregressive (AR) processes. Based on a Markovian switching policy, we assume that t...
详细信息
In this paper we consider de-interleaving a finite number of stochastic parametric sources. The sources are modeled as independent autoregressive (AR) processes. Based on a Markovian switching policy, we assume that the different sources transmit signals on the same single channel. The receiver records the 1-bit quantized version of the transmitted signal and aims to identify the sequence of active sources. Once the source sequence has been identified, the characteristics (parameters) of each source is estimated.
This paper is concerned with identification of nonlinear systems with a noisy scheduling variable, and the measurement of the system has an unknown time delay. Auto regressive exogenous (ARX) models are selected as th...
详细信息
This paper is concerned with identification of nonlinear systems with a noisy scheduling variable, and the measurement of the system has an unknown time delay. Auto regressive exogenous (ARX) models are selected as the local models, and multiple local models are identified along the process operating points. The dynamics of a nonlinear system are represented by associating a normalized exponential function with each of the ARX models; therein, the normalized exponential function is acted as the probability density function. The parameters of the ARX models and the exponential functions as well as the unknown time delay are estimated simultaneously under the expectationmaximization (EM) algorithm using the retarded input-output data. A CSTR example is given to verify the proposed identification approach.
In this paper, we provide a novel iterative identification algorithm for multi-rate sampled data systems. The procedure involves, as a first step, identifying a simple initial model from multi-rate data. Based on this...
详细信息
In this paper, we provide a novel iterative identification algorithm for multi-rate sampled data systems. The procedure involves, as a first step, identifying a simple initial model from multi-rate data. Based on this model, the "missing" data points in the slow sampled measurements are estimated following the expectationmaximization approach. Using the estimated missing data points and the original data set, a new model is obtained and this procedure is repeated until the models converge. An attractive feature of the proposed method lies in its applicability to irregularly sampled data. An application of the proposed method to an industrial data set is also included.
This paper is about the nonparametric regression of a choice variable on a nonlinear budget set under utility maximization with general heterogeneity, i.e. in the random utility model (RUM). We show that utility maxim...
详细信息
Empowered by their remarkable advantages, graph neural networks (GNN) serve as potent tools for embedding graph-structured data and finding applications across various domains. Particularly, a prevalent assumption in ...
详细信息
Empowered by their remarkable advantages, graph neural networks (GNN) serve as potent tools for embedding graph-structured data and finding applications across various domains. Particularly, a prevalent assumption in most GNNs is the reliability of the underlying graph structure. This assumption, often implicit, can inadvertently lead to the propagation of misleading information through structures like false links. In response to this challenge, numerous methods for graph structure learning (GSL) have been developed. Among these methods, one popular approach is to construct a simple and intuitive K-nearest neighbor (KNN) graph as a sample to infer true graph structure. However, KNN graphs that follow the single-point distribution can easily mislead the true graph structure estimation. The primary reason is that, from a statistical perspective, the KNN graph, as a sample, follows a single-point distribution, whereas the true graph structure, as the population, as a whole mostly follows a long-tail distribution. In theory, the sample and the population should share the same distribution;otherwise, accurately inferring the true graph structure becomes challenging. To address this problem, this paper proposes an Adaptive Graph Structure Estimation with Long-Tail Distributed Implicit Graph, referred to as AGSEI. AGSEI comprises three main components: long-tail implicit graph construction, explicit graph structure estimation, and joint optimization. The first component relies on a multi-layer graph convolutional network to learn low-order to high-order node representations, compute node similarity, and construct several corresponding long-tail implicit graphs. Since the original imperfect graph structure can mislead GNNs into propagating false information, it reduces the reliability of the long-tail implicit graphs. AGSEI attempts to limit the aggregation of irrelevant information by introducing the Hilbert-Schmidt independence criterion. That is, maximizing the dependenc
Unlike case-control studies, family-based tests for association are protected against population stratification. Complex genetic traits are often governed by quantitative precursors and it has been argued that it may ...
详细信息
Unlike case-control studies, family-based tests for association are protected against population stratification. Complex genetic traits are often governed by quantitative precursors and it has been argued that it may be a more powerful strategy to analyze these quantitative precursors instead of the clinical end point trait. Although methods have been developed for family-based association tests for single quantitative traits, it is of interest to develop such methods for multivariate phenotypes. We propose a novel transmission-based approach based on a trio design using a simple logistic regression to test for association with a multivariate phenotype. We use our proposed method to analyze data on systolic and diastolic blood pressure levels provided in Genetic Analysis Workshop 18. However, we find that the bivariate analysis of the two phenotypes did not provide more promising results compared to univariate analyses, suggesting a possibility of a different set of major genetic variants modulating the two phenotypes.
暂无评论