Privacy preservation is a challenging problem in decentralized nonconvex optimization containing sensitive data. Prior approaches to decentralized nonconvex optimization are either not strong enough to protect privacy...
详细信息
Privacy preservation is a challenging problem in decentralized nonconvex optimization containing sensitive data. Prior approaches to decentralized nonconvex optimization are either not strong enough to protect privacy or exhibit low utility under a high privacy guarantee. To address these issues, we propose a differentially private linearized alternating direction method of multipliers (DP-LADMM), which achieves fast convergence property for nonconvex objective functions while achieving saddle/maximum avoidance under differential privacy guarantee. We also apply the Analytic Gaussian Mechanism to track the cumulative privacy loss and provide a tight global differential privacy guarantee for DP-LADMM. The theoretical analysis offers an explicit convergence rate for our algorithm. To the best of our knowledge, this is the first paper to provide explicit convergence for decentralized nonconvex optimization with differential privacy and saddle/maximum avoidance. Numerical simulations and comparison studies on decentralized estimation confirm the superiority of the algorithm and the effectiveness of global privacy preservation.
In this article, we study decentralizednonconvex finite-sum minimization problems described over a network of nodes, where each node possesses a local batch of data samples. In this context, we analyze a single-times...
详细信息
In this article, we study decentralizednonconvex finite-sum minimization problems described over a network of nodes, where each node possesses a local batch of data samples. In this context, we analyze a single-timescale randomized incremental gradient method, called GT-SAGA. GT-SAGA is computationally efficient as it evaluates one component gradient per node per iteration and achieves provably fast and robust performance by leveraging node-level variance reduction and network-level gradient tracking. For general smooth nonconvex problems, we show the almost sure and mean-squared convergence of GT-SAGA to a first-order stationary point and further describe regimes of practical significance, where it outperforms the existing approaches and achieves a network topology-independent iteration complexity, respectively. When the global function satisfies the Polyak-Lojaciewisz condition, we show that GT-SAGA exhibits linear convergence to an optimal solution in expectation and describe regimes of practical interest where the performance is network topology independent and improves upon the existing methods. Numerical experiments are included to highlight the main convergence aspects of GT-SAGA in nonconvex settings.
Distributed stochastic nonconvexoptimization problems have recently received attention due to the growing interest of signal processing, computer vision, and natural language processing communities in applications de...
详细信息
Distributed stochastic nonconvexoptimization problems have recently received attention due to the growing interest of signal processing, computer vision, and natural language processing communities in applications deployed over distributed learning systems (e.g., federated learning). We study the setting where the data is distributed across the nodes of a time-varying directed network, a topology suitable for modeling dynamic networks experiencing communication delays and straggler effects. The network nodes, which can access only their local objectives and query a stochastic first-order oracle to obtain gradient estimates, collaborate to minimize a global objective function by exchanging messages with their neighbors. We propose an algorithm, novel to this setting, that leverages stochastic gradient descent with momentum and gradient tracking to solve distributed nonconvexoptimization problems over time-varying networks. To analyze the algorithm, we tackle the challenges that arise when analyzing dynamic network systems that communicate gradient acceleration components. We prove that the algorithm's oracle complexity is O(1/epsilon(1.5)), and that under Polyak-Lojasiewicz condition the algorithm converges linearly to a steady error state. The proposed scheme is tested on several learning tasks: a nonconvex logistic regression experiment on the MNIST dataset, an image classification task on the CIFAR-10 dataset, and an NLP classification test on the IMDB dataset. We further present numerical simulations with an objective that satisfies the PL condition. The results demonstrate superior performance of the proposed framework compared to the existing related methods.
暂无评论