The support/query episodic training strategy has been widely applied in modern meta learning algorithms. Supposing the n training episodes and the test episodes are sampled independently from the same environment, pre...
详细信息
ISBN:
(纸本)9781713871088
The support/query episodic training strategy has been widely applied in modern meta learning algorithms. Supposing the n training episodes and the test episodes are sampled independently from the same environment, previous work has derived a generalization bound of O(1/root n) for smooth non-convex functions via algorithmic stability analysis. In this paper, we provide fine-grained analysis of stability and generalization for modern meta learning algorithms by considering more general situations. Firstly, we develop matching lower and upper stability bounds for meta learning algorithms with two types of loss functions: (1) nonsmooth convex functions with alpha-Holder continuous subgradients (alpha is an element of[0, 1));(2) smooth (including convex and non-convex) functions. Our tight stability bounds show that, in the nonsmooth convex case, meta learning algorithms can be inherently less stable than in the smooth convex case. For the smooth non-convex functions, our stability bound is sharper than the existing one, especially in the setting where the number of iterations is larger than the number n of training episodes. Secondly, we derive improved generalization bounds for meta learning algorithms that hold with high probability. Specifically, we first demonstrate that, under the independent episode environment assumption, the generalization bound of O(1/root n) via algorithmic stability analysis is near optimal. To attain faster convergence rate, we show how to yield a deformed generalization bound of O(ln n/n) with the curvature condition of loss functions. Finally, we obtain a generalization bound for meta learning with dependent episodes whose dependency relation is characterized by a graph. Experiments on regression problems are conducted to verify our theoretical results.
Neural networks (NNs) struggle to efficiently solve certain problems, such as learning parities, even when there are simple learning algorithms for those problems. Can NNs discover learning algorithms on their own? We...
详细信息
ISBN:
(纸本)9781713871088
Neural networks (NNs) struggle to efficiently solve certain problems, such as learning parities, even when there are simple learning algorithms for those problems. Can NNs discover learning algorithms on their own? We exhibit a NN architecture that, in polynomial time, learns as well as any efficient learning algorithm describable by a constant-sized program. For example, on parity problems, the NN learns as well as Gaussian elimination, an efficient algorithm that can be succinctly described. Our architecture combines both recurrent weight sharing between layers and convolutional weight sharing to reduce the number of parameters down to a constant, even though the network itself may have trillions of nodes. While in practice the constants in our analysis are too large to be directly meaningful, our work suggests that the synergy of Recurrent and Convolutional NNs (RCNNs) may be more natural and powerful than either alone, particularly for concisely parameterizing discrete algorithms.
A Wireless Sensor Network (WSN) is composed of sensor equipped devices that aim at sensing and processing information from the surrounding environment. Energy consumption is the major concern of WSNs. At the same time...
详细信息
ISBN:
(纸本)9789812879905;9789812879899
A Wireless Sensor Network (WSN) is composed of sensor equipped devices that aim at sensing and processing information from the surrounding environment. Energy consumption is the major concern of WSNs. At the same time, quality of service is to be considered especially when dealing with critical WSNs. In this paper, we present a game theory based approach to maximize quality of service, defined as the aggregate frame success rate, while optimizing power allocation. Game theory is designed to study interactions between players (e.g. chess players) who decide on a set of actions (e.g. the players moves) to reach the objective outcomes (e.g. to win the game). Here, we model the system as a potential game. We show that the optimal power allocation, crucial in a heterogeneous sensor network, is a Nash equilibrium of this game, and we discuss its uniqueness. For simulations, we present a fully distributed algorithm that drives the whole system to the optimal power allocation.
This paper studies kernel regression problems. The focus is on studying kernel algorithms that use the least squares criterion and developing methods so that the solution in the dual observation space intelligently ch...
详细信息
ISBN:
(纸本)0780383591
This paper studies kernel regression problems. The focus is on studying kernel algorithms that use the least squares criterion and developing methods so that the solution in the dual observation space intelligently chooses training examples. The Least Squares - Support Vector Machine (LS-SVM) and variants have attracted researchers as the solution to nonlinear problems can be formulated as an optimization problem that involves finding a solution to a set of linear equations in the primal or dual spaces. A drawback of using the LS-SVM is that the solution is not sparse, but involves a solution to a set of linear equations in the dual space that is dependent on the number of observations. This paper discusses an on-line algorithm that selectively chooses to add and delete training observations. Through examples we show that this algorithm can outperform LS-SVM solutions that use a larger window of randomly trained examples.
Estimating heterogeneous treatment effects (HTE) from observational studies is rising in importance due to the widespread accumulation of data in many fields. Due to the selection bias behind the inaccessibility of co...
详细信息
Estimating heterogeneous treatment effects (HTE) from observational studies is rising in importance due to the widespread accumulation of data in many fields. Due to the selection bias behind the inaccessibility of counterfactual data, the problem differs fundamentally from supervised learning in a challenging way. However, existing works on modeling selection bias and corresponding algorithms do not naturally generalize to non-binary treatment spaces. To address this limitation, we propose to use mutual information to describe selection bias in estimating HTE and derive a novel error bound using the mutual information between the covariates and the treatments, which is the first error bound to cover general treatment schemes including multinoulli or continuous spaces. We then bring forth theoretically justified algorithms, the Mutual Information Treatment Network (MitNet), using adversarial optimization to reduce selection bias and obtain more accurate HTE estimations. Our algorithm reaches remarkable performance in both simulation study and empirical evaluation.
There are several learning methods which are suitable for neural networks. In this paper two of them are described Back-propagation (BP) and Genetic (GA) algorithms. These learning methods are compared here and they a...
详细信息
ISBN:
(纸本)9781424419456
There are several learning methods which are suitable for neural networks. In this paper two of them are described Back-propagation (BP) and Genetic (GA) algorithms. These learning methods are compared here and they are used for the control of modem telecommunication network nodes.
A class of unsupervised algorithms known as competitive learning (CL) was investigated for its application as an adaptive control mechanism for an educational toy. Two variants of CL were used, hard competitive learni...
详细信息
ISBN:
(纸本)0852966903
A class of unsupervised algorithms known as competitive learning (CL) was investigated for its application as an adaptive control mechanism for an educational toy. Two variants of CL were used, hard competitive learning (HCL) and soft competitive learning (SCL). It was clearly shown that CL was suitable for the unsupervised clustering needed in an autonomous robotic toy. SCL was found to out-perform HCL in the more challenging test cases examined. Furthermore, simulations indicated that radial basis functions may be used within the constraints of the hardware system if the exponential function was replaced with a lookup table equivalent of a least 15 elements.
In this paper a nonlinear adaptive filter for use in steel mill applications is described. This filter takes the form of a generalized regression network and is used to remove eccentricities from a steel mill force si...
详细信息
ISBN:
(纸本)0780331923
In this paper a nonlinear adaptive filter for use in steel mill applications is described. This filter takes the form of a generalized regression network and is used to remove eccentricities from a steel mill force signal. Online adaptation is achieved by means of standard recursive parameter update algorithms suitable for linear regression type models. It is demonstrated that second order methods can lead to severe parameter biasing effects and a more serious effect known as bursting, whereby the parameter estimation becomes numerically unstable. This paper discusses the above effects form a theoretical and practical viewpoint, and considers the suitability of several learning algorithms for the eccentricity filtering application. This analysis leads to a numerically robust filtering structure, the efficacy of which is demonstrated by means of results from a real steel rolling mill.
A RCF design methodology for the minimization of inter-MG coupling coefficient is proposed based on machine-learning algorithms. Lower coupling coefficient can be realized by the proposed methodology, compared with th...
详细信息
ISBN:
(纸本)9781943580705
A RCF design methodology for the minimization of inter-MG coupling coefficient is proposed based on machine-learning algorithms. Lower coupling coefficient can be realized by the proposed methodology, compared with that of the previously reported RCF.
Recently, information-theoretic analysis has become a popular framework for understanding the generalization behavior of deep neural networks. It allows a direct analysis for stochastic gradient / Langevin descent (SG...
详细信息
ISBN:
(纸本)9781956792034
Recently, information-theoretic analysis has become a popular framework for understanding the generalization behavior of deep neural networks. It allows a direct analysis for stochastic gradient / Langevin descent (SGD/SGLD) learning algorithms without strong assumptions such as Lipschitz or convexity conditions. However, the current generalization error bounds within this framework are still far from optimal, while substantial improvements on these bounds are quite challenging due to the intractability of high-dimensional information quantities. To address this issue, we first propose a novel information theoretical measure: kernelized Renyi's entropy, by utilizing operator representation in Hilbert space. It inherits the properties of Shannon's entropy and can be effectively calculated via simple random sampling, while remaining independent of the input dimension. We then establish the generalization error bounds for SGD/SGLD under kernelized Renyi's entropy, where the mutual information quantities can be directly calculated, enabling evaluation of the tightness of each intermediate step. We show that our information-theoretical bounds depend on the statistics of the stochastic gradients evaluated along with the iterates, and are rigorously tighter than the current state-of-the-art (SOTA) results. The theoretical findings are also supported by large-scale empirical studies.
暂无评论