We introduce the Adaptive Massively Parallel Computation (AMPC) model, which is an extension of the widely popular Massively Parallel Computation (MPC) model. At a high level, the AMPC model strengthens the MPC model ...
详细信息
ISBN:
(纸本)9781450361842
We introduce the Adaptive Massively Parallel Computation (AMPC) model, which is an extension of the widely popular Massively Parallel Computation (MPC) model. At a high level, the AMPC model strengthens the MPC model by storing all messages sent within a round in a distributed data store. In the following round all machines are provided with random read access to the data store, subject to the same constraints on the total amount of communication as in the MPC model. Our model is inspired by the previous empirical studies of distributed graph algorithms [9, 28] using MapReduce and a distributed hash table service [17]. This extension allows us to give new graph algorithms with much lower round complexities compared to the best known solutions in the MPC model. In particular, in the AMPC model we show how to solve maximal independent set in O(1) rounds, and connectivity/minimum spanning tree in O(log log(m/n) n) rounds, which is an exponential improvement upon the best known algorithms in the MPC model with sublinear space per machine. Our results imply that the 2-Cycle conjecture, the most popular hardness conjecture in the MPC model, does not hold in the AMPC model.
Oblivious Mobile Robots have been studied both in continuous Euclidean spaces, and discrete spaces (that is, graphs). However the obtained literature forms distinct sets of results for the two settings. In our view, t...
详细信息
ISBN:
(纸本)9783030312770;9783030312763
Oblivious Mobile Robots have been studied both in continuous Euclidean spaces, and discrete spaces (that is, graphs). However the obtained literature forms distinct sets of results for the two settings. In our view, the continuous model reflects well the physicality of robots operating in some real environment, while the discrete model reflects well the digital nature of autonomous robots, whose sensors and computing capabilities are inherently finite. We explore the possibility of bridging results between the two models. Our approach is certified using the Coq proof assistant and the Pactole framework, which we extend to the most general asynchronous model without compromising its genericity. Our extended framework is then used to formally prove the equivalence between atomic moves in a discrete space (the classical "robots on graphs" model) and non-atomic moves in a continuous unidimensional space when robot vision sensors are discrete (robots move in straigth lines between positions, but their observations are at source and destination positions only), irrespective of the problem being solved. Our effort consolidates the integration between the model, the problem specification, and its proof that is advocated by the Pactole framework.
There has been a growing effort in studying the distributed optimization problem over a network. The objective is to optimize a global function formed by a sum of local functions, using only local computation and comm...
详细信息
There has been a growing effort in studying the distributed optimization problem over a network. The objective is to optimize a global function formed by a sum of local functions, using only local computation and communication. The literature has developed consensus-based distributed (sub) gradient descent (DGD) methods and has shown that they have the same convergence rate O(log t/root t) as the centralized (sub) gradient methods (CGD), when the function is convex but possibly nonsmooth. However, when the function is convex and smooth, under the framework of DGD, it is unclear how to harness the smoothness to obtain a faster convergence rate comparable to CGD's convergence rate. In this paper, we propose a distributed algorithm that, despite using the same amount of communication per iteration as DGD, can effectively harnesses the function smoothness and converge to the optimum with a rate of O(1/t). If the objective function is further strongly convex, our algorithm has a linear convergence rate. Both rates match the convergence rate of CGD. The key step in our algorithm is a novel gradient estimation scheme that uses history information to achieve fast and accurate estimation of the average gradient. To motivate the necessity of history information, we also show that it is impossible for a class of distributed algorithms like DGDto achieve a linear convergence rate without using history information even if the objective function is strongly convex and smooth.
Processing data collected by a network of agents often boils down to solving an optimization problem. The distributed nature of these problems calls for methods that are, themselves, distributed. While most collaborat...
详细信息
Processing data collected by a network of agents often boils down to solving an optimization problem. The distributed nature of these problems calls for methods that are, themselves, distributed. While most collaborative learning problems require agents to reach a common (or consensus) model, there are situations in which the consensus solution may not be optimal. For instance, agents may want to reach a compromise between agreeing with their neighbors and minimizing a personal loss function. We present DJAM, a Jacobi-like distributed algorithm for learning personalized models. This method is implementation-friendly : it has no hyperparameters that need tuning, it is asynchronous, and its updates only require single-neighbor interactions. We prove that DJAM converges with probability one to the solution, provided that the personal lass functions are strongly convex and have Lipschitz gradient. We then give evidence that DJAM is on par with state-of-the-art methods: our method reaches a solution with error similar to the error of a carefully tuned alternating direction method of multipliers (ADMM) in about the same number of single-neighbor interactions.
While mining the association rules in distributed database, overhead increases at each site because of linkage and dependency with other sites. Each site scans database not only for itself but for the neighboring site...
详细信息
While mining the association rules in distributed database, overhead increases at each site because of linkage and dependency with other sites. Each site scans database not only for itself but for the neighboring sites also. In the most popular Count Distribution (CD) and Fast distributed Mining (FDM) algorithms, sites generate and scan the identical candidate itemsets. In the CD algorithm, sites generate candidate k + 1 itemsets using global frequent k-itemsets and in the FDM algorithm, sites generate using its own and neighboring sites heavy frequent k-itemsets. Most of the itemsets scanned by the CD algorithm are infrequent. These infrequent itemsets are not scanned in the FDM algorithm. Anyhow, in the FDM algorithm, some of the itemsets may be found frequent on neither of the sites but scanned on all the sites. In this paper, an efficient framework and an algorithm have been proposed for mining association rules in the distributed database. In the proposed framework, initially, overhead of each site for generating and scanning candidate itemsets for the neighboring sites is reduced. Later, a site either does not scan candidate k-itemset of neighboring site or postpone till its k + 1 itemsets are scanned.
We propose the first adversarially robust algorithm for monotone submodular maximization under single and multiple knapsack constraints with scalable implementations in distributed and streaming settings. For a single...
详细信息
ISBN:
(纸本)9781450362016
We propose the first adversarially robust algorithm for monotone submodular maximization under single and multiple knapsack constraints with scalable implementations in distributed and streaming settings. For a single knapsack constraint, our algorithm outputs a robust summary of almost optimal (up to polylogarithmic factors) size, from which a constant-factor approximation to the optimal solution can be constructed. For multiple knapsack constraints, our approximation is within a constant-factor of the best known non-robust solution. We evaluate the performance of our algorithms by comparison to natural robustifications of existing non-robust algorithms under two objectives: 1) dominating set for large social network graphs from Facebook and Twitter collected by the Stanford Network Analysis Project (SNAP), 2) movie recommendations on a dataset from MovieLens. Experimental results show that our algorithms give the best objective for a majority of the inputs and show strong performance even compared to offline algorithms that are given the set of removals in advance.
Big data analytics are pervasive in data-intensive systems and applications using machine learning and deep learning. IoT sensor devices are generating all types of big data, including structured (e.g., tables) and un...
详细信息
ISBN:
(纸本)9781728154169
Big data analytics are pervasive in data-intensive systems and applications using machine learning and deep learning. IoT sensor devices are generating all types of big data, including structured (e.g., tables) and unstructured data (e.g., text and image), which are beyond the processing power of humans. However, how to conduct big data analysis on IoT data without compromising IoT privacy is still an open problem. IoT shall leverage powerful learning techniques to automatically learn patterns such as similarities, correlations and abnormalities from big sensing data in a privacy-preserving manner. To make this happen, we first examine distributed learning techniques that are suitable for IoT architectures. We then propose a privacy-preserving distributed learning framework with a novel dynamic deep learning mechanism to extract patterns and learn knowledge from IoT data. Simulations are performed to show the effectiveness and efficiency of our solution.
Many big data algorithms executed on MapReducelike systems have a shuffle phase that often dominates the overall job execution time. Recent work has demonstrated schemes where the communication load in the shuffle pha...
详细信息
ISBN:
(纸本)9781538692912
Many big data algorithms executed on MapReducelike systems have a shuffle phase that often dominates the overall job execution time. Recent work has demonstrated schemes where the communication load in the shuffle phase can be traded off for the computation load in the map phase. In this work, we focus on a class of distributed algorithms, broadly used in deep learning, where intermediate computations of the same task can be combined. Even though prior techniques reduce the communication load significantly, they require a number of jobs that grows exponentially in the system parameters. This limitation is crucial and may diminish the load gains as the algorithm scales. We propose a new scheme which achieves the same load as the state-of-the-art while ensuring that the number of jobs as well as the number of subfiles that the data set needs to be split into remain small.
Mutual exclusion is one of the fundamental problems in distributed computing but existing mutual exclusion algorithms are unadapted to the dynamics and lack of membership knowledge of current distributed systems (e.g....
详细信息
ISBN:
(纸本)9781450360944
Mutual exclusion is one of the fundamental problems in distributed computing but existing mutual exclusion algorithms are unadapted to the dynamics and lack of membership knowledge of current distributed systems (e.g., mobile ad-hoc networks, peer-to-peer systems, etc.). Additionally, in order to circumvent the impossibility of solving mutual exclusion in asynchronous message passing systems where processes can crash, some solutions include the use of (T+Sigma(l)) [3], which is the weakest failure detector to solve mutual exclusion in known static distributed systems. In this paper, we define a new failure detector T Sigma(lr) which is equivalent to (T+Sigma(l)) in known static systems, and prove that T Sigma(lr) is the weakest failure detector to solve mutual exclusion in unknown dynamic systems with partial memory losses. We consider that crashed processes may recover.
Mobile crowdsensing (MCS) has become a popular paradigm for data collection in urban environments. In MCS systems, a crowd supplies sensing information for monitoring phenomena through mobile devices. Typically, a lar...
详细信息
ISBN:
(纸本)9781450369046
Mobile crowdsensing (MCS) has become a popular paradigm for data collection in urban environments. In MCS systems, a crowd supplies sensing information for monitoring phenomena through mobile devices. Typically, a large number of participants is required to make a sensing campaign successful. For such a reason, it is often not practical for researchers to build and deploy large testbeds to assess the performance of frameworks and algorithms for data collection, user recruitment, and evaluating the quality of information. Simulations offer a valid alternative. In this paper, we present CrowdSenSim 2.0, a significant extension of the popular CrowdSenSim simulation platform. CrowdSenSim 2.0 features a stateful approach to support algorithms where the chronological order of events matters, extensions of the architectural modules, including an additional system to model urban environments, code refactoring, and parallel execution of algorithms. All these improvements boost the performances of the simulator and make the runtime execution and memory utilization significantly lower, also enabling the support for larger simulation scenarios. We demonstrate retro-compatibility with the older platform and evaluate as a case study a stateful data collection algorithm.
暂无评论