Probabilistic planning captures the uncertainty of plan execution by probabilistically modeling the effects of actions in the environment, and therefore the probability of reaching different states from a given state ...
ISBN:
(纸本)9781627480031
Probabilistic planning captures the uncertainty of plan execution by probabilistically modeling the effects of actions in the environment, and therefore the probability of reaching different states from a given state and action. In order to compute a solution for a probabilistic planning problem, planners need to manage the uncertainty associated with the different paths from the initial state to a goal state. Several approaches to manage uncertainty were proposed, e.g., consider all paths at once, perform determinization of actions, and sampling. In this paper, we introduce trajectory-based short-sighted Stochastic Shortest Path Problems (SSPs), a novel approach to manage uncertainty for probabilistic planning problems in which states reachable with low probability are substituted by artificial goals that heuristically estimate their cost to reach a goal state. We also extend the theoretical results of Short-Sighted Probabilistic Planner (SSiPP) [1] by proving that SSiPP always finishes and is asymptotically optimal under sufficient conditions on the structure of short-sighted SSPs. We empirically compare SSiPP using trajectory-based short-sighted SSPs with the winners of the previous probabilistic planning competitions and other state-of-the-art planners in the triangle tireworld problems. Trajectory-based SSiPP outperforms all the competitors and is the only planner able to scale up to problem number 60, a problem in which the optimal solution contains approximately 10~(70) states.
Recent advances in genome-wide identification of protein-protein interactions (PPIs) have produced an abundance of interaction data which give an insight into functional associations among proteins. However, it is kno...
详细信息
Recent advances in genome-wide identification of protein-protein interactions (PPIs) have produced an abundance of interaction data which give an insight into functional associations among proteins. However, it is known that the PPI datasets determined by high-throughput experiments or inferred by computational methods include an extremely large number of false positives. Using Gene Ontology (GO) and its annotations, we assess reliability of the PPIs by considering the semantic similarity of interacting proteins. Protein pairs with high semantic similarity are considered highly likely to share common functions, and therefore, are more likely to interact. We analyze the performance of existing semantic similarity measures in terms of functional consistency and propose a combined method that achieves improved performance over existing methods. The semantic similarity measures are applied to identify false positive PPIs. The classification results show that the combined hybrid method has higher accuracy than the other existing measures. Furthermore, the combined hybrid classifier predicts that 59.6% of the S. cerevisiae PPIs from the BioGRID database are false positives.
A number of dependent nonparametric processes have been proposed to model non-stationary data with unknown latent dimensionality. However, the inference algorithms are often slow and unwieldy, and are in general highl...
ISBN:
(纸本)9781627480031
A number of dependent nonparametric processes have been proposed to model non-stationary data with unknown latent dimensionality. However, the inference algorithms are often slow and unwieldy, and are in general highly specific to a given model formulation. In this paper, we describe a large class of dependent nonparametric processes, including several existing models, and present a slice sampler that allows efficient inference across this class of models.
We describe a Bayesian method for group feature selection in linear regression problems. The method is based on a generalized version of the standard spike-and-slab prior distribution which is often used for individua...
详细信息
We describe a Bayesian method for group feature selection in linear regression problems. The method is based on a generalized version of the standard spike-and-slab prior distribution which is often used for individual feature selection. Exact Bayesian inference under the prior considered is infeasible for typical regression problems. However, approximate inference can be carried out efficiently using Expectation Propagation (EP). A detailed analysis of the generalized spike-and-slab prior shows that it is well suited for regression problems that are sparse at the group level. Furthermore, this prior can be used to introduce prior knowledge about specific groups of features that are a priori believed to be more relevant. An experimental evaluation compares the performance of the proposed method with those of group LASSO, Bayesian group LASSO, automatic relevance determination and additional variants used for group feature selection. The results of these experiments show that a model based on the generalized spike-and-slab prior and the EP algorithm has state-of-the-art prediction performance in the problems analyzed. Furthermore, this model is also very useful to carry out sequential experimental design (also known as active learning), where the data instances that are most informative are iteratively included in the training set, reducing the number of instances needed to obtain a particular level of prediction accuracy.
Least Square Support Vector machine (LS-SVM) converts the hinge loss function of SVM into a least square loss function which simplified the original quadratic programming training method to a linear system solving pro...
详细信息
Least Square Support Vector machine (LS-SVM) converts the hinge loss function of SVM into a least square loss function which simplified the original quadratic programming training method to a linear system solving problem. Sparse LS-SVM is obtained with a pruning procedure. The performance of sparse LS-SVM depends on the selection of hyper-parameters (i.e. kernel and penalty parameters). Currently, CV and LOO are the most common methods to select hyper-parameters for LS-SVM. However, CV is computationally expensive while LOO yields a high variance of validation error which may mislead the selection of hyper-parameters. Selecting both kernel and penalty parameters simultaneously needs to search in a high dimensional parameter space. In this work, we propose a new two-step hyper-parameter selection method. Distance between Two Classes (DBTC) method is adopted to select the kernel parameters based on a maximization of between-class separation of the projected samples in the feature space. However, data distribution could not be helpful in penalty parameter selection. Therefore, we propose to select the penalty parameter via a minimization of a Localized Generalization Error to enhance the generalization capability of the LS-SVM. Experimental results comparing to existing methods show the proposed two-step method yields better LS-SVMs in term of average testing accuracies.
We prove a new exponential concentration inequality for a plug-in estimator of the Shannon mutual information. Previous results on mutual information estimation only bounded expected error. The advantage of having the...
详细信息
ISBN:
(纸本)9781627480031
We prove a new exponential concentration inequality for a plug-in estimator of the Shannon mutual information. Previous results on mutual information estimation only bounded expected error. The advantage of having the exponential inequality is that, combined with the union bound, we can guarantee accurate estimators of the mutual information for many pairs of random variables simultaneously. As an application, we show how to use such a result to optimally estimate the density function and graph of a distribution which is Markov to a forest graph.
Recurrent neural networks are successfully used for tasks like time series processing and system identification. Many of the approaches to train these networks, however, are often regarded as too slow, too complicated...
详细信息
ISBN:
(纸本)9781467314886
Recurrent neural networks are successfully used for tasks like time series processing and system identification. Many of the approaches to train these networks, however, are often regarded as too slow, too complicated, or both. Reservoir computing methods like echo state networks or liquid state machines are an alternative to the more traditional approaches. Echo state networks have the appeal that they are simple to train, and that they have shown to be able to produce excellent results for a number of benchmarks and other tasks. One disadvantage of echo state networks, however, is the high variability in their performance due to a randomly connected hidden layer. Ideally, an efficient and more deterministic way to create connections in the hidden layer could be found, with a performance better than randomly connected hidden layers but without excessively iterating over the same training data many times. We present an approach - tamed reservoirs - that makes use of efficient feedforward training methods, and performs better than echo state networks for some time series prediction tasks. Moreover, our approach reduces some of the variability since all recurrent connections in the network are trained.
In many applications, classification systems often require human intervention in the loop. In such cases the decision process must be transparent and comprehensible, simultaneously requiring minimal assumptions on the...
详细信息
ISBN:
(纸本)9781627480031
In many applications, classification systems often require human intervention in the loop. In such cases the decision process must be transparent and comprehensible, simultaneously requiring minimal assumptions on the underlying data distributions. To tackle this problem, we formulate an axis-aligned subspace-finding task under the assumption that query specific information dictates the complementary use of the subspaces. We develop a regression-based approach called RECIP that efficiently solves this problem by finding projections that minimize a nonparametric conditional entropy estimator. Experiments show that the method is accurate in identifying the informative projections of the dataset, picking the correct views to classify query points, and facilitates visual evaluation by users.
In many database applications, ranking queries may reference both text and numeric attributes, where the ranking functions are based on both semantic distances/similarities for text attributes and numeric distances fo...
详细信息
In many database applications, ranking queries may reference both text and numeric attributes, where the ranking functions are based on both semantic distances/similarities for text attributes and numeric distances for numeric attributes. In this paper, we propose a new method for evaluating such type of ranking queries over a relational database. By statistics and training, this method builds a mechanism that combines the semantic and numeric distances, and the mechanism can be used to balance the effects of text attributes and numeric attributes on matching a given query and tuples in database search. The basic idea of the method is to create an index based on WordNet to expand the tuple words semantically for text attributes and on the information of numeric attributes. The candidate results for a query are retrieved by the index and a simple SQL selection statement, and then top-N answers are obtained. The results of extensive experiments indicate that the performance of this new strategy is efficient and effective.
In this paper, we propose a new method based on index to realize IR-style Chinese keyword search with ranking strategies in relational databases. This method creates an index by using the related information of tuple ...
详细信息
In this paper, we propose a new method based on index to realize IR-style Chinese keyword search with ranking strategies in relational databases. This method creates an index by using the related information of tuple words and presents a ranking strategy in terms of the nature of Chinese words. For a Chinese keyword query, the index is used to match query search words and the tuple words in index quickly, and to compute similarities between the query and tuples by the ranking strategy, and then the set of identifiers of candidate tuples is generated. Thus, we retrieve top-N results of the query using SQL selection statements and output the ranked answers according to the similarities. The experimental results show that our method is efficient and effective.
暂无评论