With the abundance of data, machine learning applications engaged increased attention in the last decade. An attractive approach to robustify the statistical analysis is to preprocess the data through clustering. This...
详细信息
With the abundance of data, machine learning applications engaged increased attention in the last decade. An attractive approach to robustify the statistical analysis is to preprocess the data through clustering. This paper develops a low-complexity Riemannian optimization framework for solving optimization problems on the set of positive semidefinite stochastic matrices. The low-complexity feature of the proposed algorithms stems from the factorization of the optimization variable X = YYT and deriving conditions on the number of columns of Y under which the factorization yields a satisfactory solution. The paper further investigates the embedded and quotient geometries of the resulting Riemannian manifolds. In particular, the paper explicitly derives the tangent space, Riemannian gradients and Hessians, and a retraction operator allowing the design of efficient first and second-order optimization methods for the graph-based clustering applications of interest. The numerical results reveal that the resulting algorithms present a clear complexity advantage as compared with state-of-the-art euclidean and Riemannian approaches for graph clustering applications.
Demand forecasting plays an important role in the deployment of mobile clinic services to vulnerable communities such as school zones and census tracts as it can help the service provider to maximize its coverage unde...
详细信息
Demand forecasting plays an important role in the deployment of mobile clinic services to vulnerable communities such as school zones and census tracts as it can help the service provider to maximize its coverage under limited resources. In this paper, we consider the issue of how to predict the vaccination delinquency in schools and census tracts. Such an issue is rather challenging as the delinquency is only observed in schools for which very limited information is available;while rich demographic and economic information is available for census tracts, no observations of delinquency have been made at the census tract level. To address the above challenge, we first develop a hierarchical approach to forecast the demand for vaccinations in schools and census tracts. In the first stage of the hierarchical approach, we solve a linear optimization model to compute an association matrix that can align some common features in both census tracts and school zones. Then we use the estimated association to develop a forecasting model to predict the vaccination delinquency in both schools and census tracts. A non-convex quadratic optimization (QO) model is also proposed to find the association matrix and the forecasting model simultaneously. We also introduce an alternative update scheme for the non-convex QO and establish the convergence of the algorithm. Moreover, the two association matrices generated from the proposed approaches can be used to impute the information in the school zone data, which further allows us to apply existing forecasting models to predict the demand in school zones based on the imputed data. A case study from the Houston Independent School District (HISD) and its associated communities is reported to demonstrate the efficacy of the new models and techniques.
The sparse prior has been widely adopted to establish data models for numerous applications. In this context, most of them are based on one of three foundational paradigms: the conventional sparse representation, the ...
详细信息
First-order optimization algorithms play a major role in large scale machine learning. A new class of methods, called adaptive algorithms, were recently introduced to adjust iteratively the learning rate for each coor...
详细信息
First-order optimization algorithms play a major role in large scale machine learning. A new class of methods, called adaptive algorithms, were recently introduced to adjust iteratively the learning rate for each coordinate. Despite great practical success in deep learning, their behavior and performance on more general loss functions are not well understood. In this paper, we derive a non-autonomous system of differential equations, which is the continuous time limit of adaptive optimization methods. We study the convergence of its trajectories and give conditions under which the differential system, underlying all adaptive algorithms, is suitable for optimization. We discuss convergence to a critical point in the non-convex case and give conditions for the dynamics to avoid saddle points and local maxima. For convex loss function, we introduce a suitable Lyapunov functional which allows us to study its rate of convergence. Several other properties of both the continuous and discrete systems are briefly discussed. The differential system studied in the paper is general enough to encompass many other classical algorithms (such as Heavy Ball and Nesterov's accelerated method) and allow us to recover several known results for these algorithms.
We consider several line-search based gradient methods for stochastic optimization: a gradient and accelerated gradient methods for convexoptimization and gradient method for non-convexoptimization. The methods simu...
详细信息
We consider several line-search based gradient methods for stochastic optimization: a gradient and accelerated gradient methods for convexoptimization and gradient method for non-convexoptimization. The methods simultaneously adapt to the unknown Lipschitz constant of the gradient and variance of the stochastic approximation for the gradient. The focus of this paper is to numerically compare such methods with state-of-the-art adaptive methods which are based on a different idea of taking norm of the stochastic gradient to define the stepsize, e.g., AdaGrad and Adam. Copyright (C) 2020 The Authors.
Near isometric orthogonal embeddings to lower dimensions are a fundamental tool in data science and machine learning. In this paper, we present the construction of such embeddings that minimizes the maximum distortion...
详细信息
Near isometric orthogonal embeddings to lower dimensions are a fundamental tool in data science and machine learning. In this paper, we present the construction of such embeddings that minimizes the maximum distortion for a given set of points. We formulate the problem as a nonconvex constrained optimization problem. We first construct a primal relaxation and then use the theory of Lagrange duality to create a dual relaxation. We also suggest a polynomial time algorithm based on the theory of convexoptimization to solve the dual relaxation provably. We provide a theoretical upper bound on the approximation guarantees for our algorithm, which depends only on the spectral properties of the dataset. We experimentally demonstrate the superiority of our algorithm compared to baselines in terms of the scalability and the ability to achieve lower distortion.
First-order optimization algorithms play a major role in large scale machine learning. A new class of methods, called adaptive algorithms, were recently introduced to adjust iteratively the learning rate for each coor...
详细信息
First-order optimization algorithms play a major role in large scale machine learning. A new class of methods, called adaptive algorithms, were recently introduced to adjust iteratively the learning rate for each coordinate. Despite great practical success in deep learning, their behavior and performance on more general loss functions are not well understood. In this paper, we derive a non-autonomous system of differential equations, which is the continuous time limit of adaptive optimization methods. We study the convergence of its trajectories and give conditions under which the differential system, underlying all adaptive algorithms, is suitable for optimization. We discuss convergence to a critical point in the non-convex case and give conditions for the dynamics to avoid saddle points and local maxima. For convex loss function, we introduce a suitable Lyapunov functional which allows us to study its rate of convergence. Several other properties of both the continuous and discrete systems are briefly discussed. The differential system studied in the paper is general enough to encompass many other classical algorithms (such as HEAVY BALL and NESTEROV's accelerated method) and allow us to recover several known results for these algorithms.
We consider supervised learning problems where the features are embedded in a graph, such as gene expressions in a gene network. In this context, it is of much interest to automatically select a subgraph with few conn...
详细信息
We consider supervised learning problems where the features are embedded in a graph, such as gene expressions in a gene network. In this context, it is of much interest to automatically select a subgraph with few connected components;by exploiting prior knowledge, one can indeed improve the prediction performance or obtain results that are easier to interpret. Regularization or penalty functions for selecting features in graphs have recently been proposed, but they raise new algorithmic challenges. For example, they typically require solving a combinatorially hard selection problem among all connected subgraphs. In this paper, we propose computationally feasible strategies to select a sparse and well-connected subset of features sitting on a directed acyclic graph (DAG). We introduce structured sparsity penalties over paths on a DAG called "path coding" penalties. Unlike existing regularization functions that model long-range interactions between features in a graph, path coding penalties are tractable. The penalties and their proximal operators involve path selection problems, which we efficiently solve by leveraging network flow optimization. We experimentally show on synthetic, image, and genomic data that our approach is scalable and leads to more connected subgraphs than other regularization functions for graphs.
We consider supervised learning problems where the features are embedded in a graph, such as gene expressions in a gene network. In this context, it is of much interest to automatically select a subgraph with few conn...
详细信息
We consider supervised learning problems where the features are embedded in a graph, such as gene expressions in a gene network. In this context, it is of much interest to automatically select a subgraph with few connected components; by exploiting prior knowledge, one can indeed improve the prediction performance or obtain results that are easier to interpret. Regularization or penalty functions for selecting features in graphs have recently been proposed, but they raise new algorithmic challenges. For example, they typically require solving a combinatorially hard selection problem among all connected subgraphs. In this paper, we propose computationally feasible strategies to select a sparse and well-connected subset of features sitting on a directed acyclic graph (DAG). We introduce structured sparsity penalties over paths on a DAG called "path coding" penalties. Unlike existing regularization functions that model long-range interactions between features in a graph, path coding penalties are tractable. The penalties and their proximal operators involve path selection problems, which we efficiently solve by leveraging network flow optimization. We experimentally show on synthetic, image, and genomic data that our approach is scalable and leads to more connected subgraphs than other regularization functions for graphs.
Motivated by the need for fast computations in wireless sensor networks, the new F-Lipschitz optimization theory is introduced for a novel class of optimization problems. These problems are defined by simple qualifyin...
详细信息
Motivated by the need for fast computations in wireless sensor networks, the new F-Lipschitz optimization theory is introduced for a novel class of optimization problems. These problems are defined by simple qualifying properties specified in terms of increasing objective function and contractive constraints. It is shown that feasible F-Lipschitz problems have always a unique optimal solution that satisfies the constraints at equality. The solution is obtained quickly by asynchronous algorithms of certified convergence. F-Lipschitz optimization can be applied to both centralized and distributed optimization. Compared to traditional Lagrangian methods, which often converge linearly, the convergence time of centralized F-Lipschitz problems is at least superlinear. Distributed F-Lipschitz algorithms converge fast, as opposed to traditional Lagrangian decomposition and parallelization methods, which generally converge slowly and at the price of many message passings. In both cases, the computational complexity is much lower than traditional Lagrangian methods. Examples of application of the new optimization method are given for distributed estimation and radio power control in wireless sensor networks. The drawback of the F-Lipschitz optimization is that it might be difficult to check the qualifying properties. For more general optimization problems, it is suggested that it is convenient to have conditions ensuring that the solution satisfies the constraints at equality.
暂无评论