We consider the minimal cost server configuration for meeting resource demands over multiple time slots. Specifically, there are some heterogeneous servers. Each server is specified by a cost, certain amounts of sever...
详细信息
We consider the minimal cost server configuration for meeting resource demands over multiple time slots. Specifically, there are some heterogeneous servers. Each server is specified by a cost, certain amounts of several resources, and an active interval, i.e., the time interval that the server is planed to work. There are different overall demands for each type of resource over different time slots. A feasible solution is a set of servers such that at any time slot, the resources provided by the selected servers are at least their corresponding demands. Notice that, a selected server can not provide resources for the time slots out of its active interval. The total cost of the solution is the summation of the costs of all selected servers. The goal is to find a feasible solution with minimal total cost. This problem is proved to be NP-hard due to a reduction from the multidimensional knapsack problem (MKP), which is a well-known NP-hard combinational optimization problem. To solve our problem, we present a randomized approximation algorithm called partial rounding algorithm (PRA), which guarantees O(log(KT))-approximation, i.e., eta log(KT)-approximation, where K is the number of kinds of resources, T is the number of time slots, and h is a positive constant. Furthermore, to minimize h as much as possible, we propose a varied Chernoff bound and apply it in PRA. We perform extensive experiments with random inputs and a specific application input. The results show that PRA with our varied Chernoff conclusion can find solutions closing to the optimal one.
Tensor decomposition methods are well-known tools for multilinear feature extraction from multi-way arrays with many important applications in signal processing and machine learning. Nonnegative Tensor Factorization (...
详细信息
ISBN:
(纸本)9781538669792
Tensor decomposition methods are well-known tools for multilinear feature extraction from multi-way arrays with many important applications in signal processing and machine learning. Nonnegative Tensor Factorization (NTF) is a particular case of such methods, mostly addressed for processing nonnegative multi-way arrays, such as hyperspectral observations or a set of images. One of the most efficient algorithms for NTF is the Hierarchical Alternating Least Squares (HALS) algorithm that belongs to a family of coordinate gradient descent updates. Despite its very good numerical properties, its computational complexity is quite large for large-scale datasets. In this study, we propose the randomized extension of the HALS, which considerably decreases its computational complexity with respect to the standard HALS. The numerical experiments, performed for various large-scale observations, confirm that the proposed algorithm is much faster than the standard one at the cost of slightly decreased performance.
The constantly growing wireless bandwidth demand is pushing wireless networks to multi-tier architectures consisting of a macrocell tier and a number of dense small cell deployment tiers. In such a multi-tier multi-ce...
详细信息
The constantly growing wireless bandwidth demand is pushing wireless networks to multi-tier architectures consisting of a macrocell tier and a number of dense small cell deployment tiers. In such a multi-tier multi-cell environment, the classic problem of associating users to base stations becomes both more challenging and more critical to the overall network performance. Most previous analytical work is focused on designing static user-cell association algorithms, which, to achieve optimality, are periodically applied whenever there are new user arrivals, thus potentially inducing a large number of re-associations for previously arrived users. On the other hand, practical online algorithms that do not allow any such user re-association are often based on heuristics and may not have any performance guarantees. In this paper, we propose online algorithms for the multi-tier multi-cell user association problem that have provable performance guarantees, which improve previously known bounds by a sizable amount. The proposed algorithms are motivated by online combinatorial auctions, while capturing and leveraging the relative sparsity of choices in wireless networks as compared with auction setups. Our champion algorithm is a 1/2-alpha(-1) approximationalgorithm, where a is the maximum number of feasible associations for a user and is, in general, small due to path loss. Our analysis considers the state-of-the-art wireless technologies, such as massive and multiuser MIMO, and practical aspects of the system such as the fact that highly mobile users have a preference to connect to larger cell tiers to keep the signaling overhead low. In addition to establishing formal performance bounds, we also conduct simulations under realistic assumptions, which establish the superiority of the proposed algorithm over existing approaches under real-world scenarios.
The ever growing wireless bandwidth demand is pushing WiFi and cellular networks to dense multi-cell deployments, as well as to multi-tier architectures consisting of macrocells and small cells. In such a multi-tier m...
详细信息
ISBN:
(纸本)9781450341844
The ever growing wireless bandwidth demand is pushing WiFi and cellular networks to dense multi-cell deployments, as well as to multi-tier architectures consisting of macrocells and small cells. In such a multi-tier multi-cell environment, the classic problem of associating users to base stations becomes both more challenging and more critical to the overall network performance. Most previous analytical work is focused on offline/static user-cell association, where the users' arrivals and their rates are assumed to be known in advance and thus has little practical relevance. On the other hand, practical online algorithms based on heuristics are often suboptimal and may not provide any performance guarantees. In this paper, we propose an online algorithm for the multi-tier multi-cell user association problem that has a provable performance guarantee which improves previously known bounds by a sizable amount. The proposed algorithm is motivated by online combinatorial auctions, while capturing and leveraging the relative sparsity of choices in wireless networks as compared to auction setups. Specifically, it is a 1/2-a-1 approximationalgorithm, where a is the maximum number of feasible associations for a user and is, in general, small due to path loss. In addition to establishing formal performance bounds, we also conduct simulations under realistic assumptions which establish the superiority of the proposed algorithm over existing approaches under real-world scenarios.
We investigate the problem of estimating on the fly the frequency at which items recur in large scale distributed data streams, which has become the norm in cloud-based application. This paper presents CASE, a combina...
详细信息
ISBN:
(纸本)9781467377416
We investigate the problem of estimating on the fly the frequency at which items recur in large scale distributed data streams, which has become the norm in cloud-based application. This paper presents CASE, a combination of tools and probabilistic algorithms from the data streaming model. In this model, functions are estimated on a huge sequence of data items, in an online fashion, and with a very small amount of memory with respect to both the size of the input stream and the values domain from which data items are drawn. We derive upper and lower bounds on the quality of CASE, improving upon the Count-Min sketch algorithm which has, so far, been the best algorithm in terms of space and time performance to estimate the frequency of data items. We prove that CASE guarantees an (epsilon, delta)-approximation of the frequency of all the items, provided they are not rare, in a space-efficient way and for any input stream. Experiments on both synthetic and real datasets confirm our analysis.
Estimating the frequency of any piece of information in large-scale distributed data streams became of utmost importance in the last decade (e.g., in the context of network monitoring, big data, etc.). If some elegant...
详细信息
ISBN:
(纸本)9781509018499
Estimating the frequency of any piece of information in large-scale distributed data streams became of utmost importance in the last decade (e.g., in the context of network monitoring, big data, etc.). If some elegant solutions have been proposed recently, their approximation is computed from the inception of the stream. In a runtime distributed context, one would prefer to gather information only about the recent past. This may be led by the need to save resources or by the fact that recent information is more relevant. In this paper, we consider the sliding window model and propose two different (on-line) algorithms that approximate the items frequency in the active window. More precisely, we determine a (epsilon, delta)-additive-approximation meaning that the error is greater than epsilon only with probability delta. These solutions use a very small amount of memory with respect to the size N of the window and the number n of distinct items of the stream, namely, O(1/epsilon log 1/delta (log N + log n)) and O(1/tau epsilon log 1/delta (log N + log n)) bits of space, where tau is a parameter limiting memory usage. We also provide their distributed variant, i.e., considering the sliding window functional monitoring model. We compared the proposed algorithms to each other and also to the state of the art through extensive experiments on synthetic traces and real data sets that validate the robustness and accuracy of our algorithms.
In this paper, we consider the setting of large-scale distributed systems, in which each node needs to quickly process a huge amount of data received in the form of a stream that may have been tampered with by an adve...
详细信息
In this paper, we consider the setting of large-scale distributed systems, in which each node needs to quickly process a huge amount of data received in the form of a stream that may have been tampered with by an adversary. In this situation, a fundamental problem is how to detect and quantify the amount of work performed by the adversary. To address this issue, we propose a novel algorithm AnKLe for estimating the Kullback-Leibler divergence of an observed stream compared with the expected one. AnKLe combines sampling techniques and information-theoretic methods. It is very efficient, both in terms of space and time complexities, and requires only a single pass over the data stream. We show that AnKLe is an (epsilon, delta)-approximationalgorithm with a space complexity (O) over tilde (1/epsilon+1/epsilon(2)) bits in "most" cases, and (O) over tilde (1/epsilon+n-epsilon (1)/epsilon(2)) otherwise, where n is the number of distinct data items in a stream. Moreover, we propose a distributed version of AnKLe that requires at most O(rl(log n + 1) bits of communication between the l participating nodes, where r is number of rounds of the algorithm. Experimental results show that the estimation provided by AnKLe remains accurate even for different adversarial settings for which the quality of other methods dramatically decreases.
We investigate a metric facility location problem in a distributed setting. In this problem, we assume that each point is a client as well as a potential location for a facility and that the opening costs for the faci...
详细信息
We investigate a metric facility location problem in a distributed setting. In this problem, we assume that each point is a client as well as a potential location for a facility and that the opening costs for the facilities and the demands of the clients are uniform. The goal is to open a subset of the input points as facilities such that the accumulated cost for the whole point set, consisting of the opening costs for the facilities and the connection costs for the clients, is minimized. We present a randomized distributed algorithm that computes in expectation an -approximate solution to the metric facility location problem described above. Our algorithm works in a synchronous message passing model, where each point is an autonomous computational entity that has its own local memory and that communicates with the other entities by message passing. We assume that each entity knows the distance to all the other entities, but does not know any of the other pairwise distances. Our algorithm uses three rounds of all-to-all communication with message sizes bounded to bits, where n is the number of input points. We extend our distributed algorithm to constant powers of metric spaces. For a metric exponent a""a parts per thousand yen1, we obtain a randomized -approximationalgorithm that uses three rounds of all-to-all communication with message sizes bounded to bits.
We consider the problem of achieving uniform node sampling in large scale systems in presence of a strong adversary. We first propose an omniscient strategy that processes on the fly an unbounded and arbitrarily biase...
详细信息
ISBN:
(纸本)9781479901814;9781467364713
We consider the problem of achieving uniform node sampling in large scale systems in presence of a strong adversary. We first propose an omniscient strategy that processes on the fly an unbounded and arbitrarily biased input stream made of node identifiers exchanged within the system, and outputs a stream that preserves Uniformity and Freshness properties. We show through Markov chains analysis that both properties hold despite any arbitrary bias introduced by the adversary. We then propose a knowledge-free strategy and show through extensive simulations that this strategy accurately approximates the omniscient one. We also evaluate its resilience against a strong adversary by studying two representative attacks (flooding and targeted attacks). We quantify the minimum number of identifiers that the adversary must insert in the input stream to prevent uniformity. To our knowledge, such an analysis has never been proposed before.
In this paper, we consider the setting of large scale distributed systems, in which each node needs to quickly process a huge amount of data received in the form of a stream that may have been tampered with by an adve...
详细信息
ISBN:
(纸本)9780769547732
In this paper, we consider the setting of large scale distributed systems, in which each node needs to quickly process a huge amount of data received in the form of a stream that may have been tampered with by an adversary. In this situation, a fundamental problem is how to detect and quantify the amount of work performed by the adversary. To address this issue, we have proposed in a prior work, AnKLe, a one pass algorithm for estimating the Kullback-Leibler divergence of an observed stream compared to the expected one. Experimental evaluations have shown that the estimation provided by AnKLe is accurate for different adversarial settings for which the quality of other methods dramatically decreases. In the present paper, considering n as the number of distinct data items in a stream, we show that AnKLe is an (epsilon, delta)-approximationalgorithm with a space complexity (O) over tilde (1/epsilon + 1/epsilon(2)) bits in "most" cases, and (O) over tilde (1/epsilon + n-epsilon(-1)/epsilon(2)) otherwise. To the best of our knowledge, an approximationalgorithm for estimating the Kullback-Leibler divergence has never been analyzed before.
暂无评论