The widespread monitoring of electricity consumption due to increasingly pervasive deployment of networked sensors in urban environments has resulted in an unprecedentedly large volume of data being collected. Particu...
详细信息
The widespread monitoring of electricity consumption due to increasingly pervasive deployment of networked sensors in urban environments has resulted in an unprecedentedly large volume of data being collected. Particularly, with the emerging Smart Grid technologies becoming more ubiquitous, real-time and online analytics for discovering the underlying structure of increasing-dimensional (w.r.t. time) consumer time series data are crucial to convert the massive amount of fine-grained energy information gathered from residential smart meters into appropriate demand response (DR) insights. In this paper we propose READER and OPTIC, that are real-time and online algorithmic pre-processing frameworks respectively, for effective DR in the Smart Grid. READER (OPTIC) helps discover underlying structure from increasing-dimensional consumer consumption time series data in a provably optimal real-time (online) fashion. READER (OPTIC) catalyzes the efficacy of DR programs by systematically and efficiently managing the energy consumption data deluge, at the same time capturing in real-time (online), specific behavior, i.e., households or time instants with similar consumption patterns. The primary feature of READER (OPTIC) is a real-time (online) randomized approximation algorithm for grouping consumers based on their electricity consumption time series data, and provides two crucial benefits: (i) time efficiently tackles high volume, increasing-dimensional time series data and (ii) provides provable worst case grouping performance guarantees. We validate the grouping and DR efficacy of READER and OPTIC via extensive experiments conducted on both, a USC microgrid dataset as well as a synthetically generated dataset.
Restarting is a technique frequently employed in randomized algorithms. After some number of computation steps, the state of the algorithm is reinitialized with a new, independent random seed. Luby et al. (Inf. Proces...
详细信息
Restarting is a technique frequently employed in randomized algorithms. After some number of computation steps, the state of the algorithm is reinitialized with a new, independent random seed. Luby et al. (Inf. Process. Lett. 47(4), 173-180, 1993) introduced a universal restart strategy. They showed that their strategy is an optimal universal strategy in the worst case. However, the optimality result has only been shown for discrete processes. In this work, it is shown that their result does not translate into a continuous setting. Furthermore, we show that there are no (asymptotically) optimal strategies in a continuous setting. Nevertheless, we obtain an optimal universal strategy on a restricted class of continuous probability distributions. Furthermore, as a side result, we show that the expected value under restarts for the lognormal distribution tends towards 0. Finally, the results are illustrated using simulations.
In the Steiner point removal problem, we are given a weighted graph G = (V, E) and a set of terminals K subset of V of size k. The objective is to find a minor M of G with only the terminals as its vertex set, such th...
详细信息
In the Steiner point removal problem, we are given a weighted graph G = (V, E) and a set of terminals K subset of V of size k. The objective is to find a minor M of G with only the terminals as its vertex set, such that distances between the terminals will be preserved up to a small multiplicative distortion. Kamma, Krauthgamer, and Nguyen [SIAM J. Comput., 44 (2015), pp. 975-995] devised a ball-growing algorithm with exponential distributions to show that the distortion is at most O(log(5) k). Cheung [Proceedings of the 29th Annual ACM/SIAM Symposium on Discrete algorithms, 2018, pp. 1353-1360] improved the analysis of the same algorithm, bounding the distortion by O(log(2) k). We devise a novel and simpler algorithm (called the Relaxed-Voronoi algorithm) which incurs distortion O(log k). This algorithm can be implemented in almost linear time (O(vertical bar E vertical bar log vertical bar V vertical bar)).
In this paper we focus on the problem of designing very fast parallel algorithms for the planar convex hull problem that achieve the optimal O(n log H) work-bound for input size n and output size H. Our algorithms are...
详细信息
In this paper we focus on the problem of designing very fast parallel algorithms for the planar convex hull problem that achieve the optimal O(n log H) work-bound for input size n and output size H. Our algorithms are designed for the arbitrary CRCW PRAM model. We first describe a very simple O(log n log H) time optimal deterministic algorithm for the planar hulls which is an improvement over the previously known Omega(log(2) n) time algorithm for small outputs. For larger values of H, we can achieve a running time of O(log n log log n) steps with optimal work. We also present a fast randomized algorithm that runs in expected time O(log *** log n) and does optimal O(n log H) work. For log H = Omega(log log n), we can achieve the optimal running time of O(log H) while simultaneously keeping the work optimal. When log H is o(log n), our results improve upon the previously best known Theta(log n) expected time randomized algorithm of Ghouse and Goodrich. The randomized algorithms do not assume any input distribution and the running times hold with high probability. (C) 1997 Elsevier Science B.V.
The proliferation of video content on the Web makes similarity detection an indispensable tool in Web data management, searching, and navigation. In this paper, we propose a number of algorithms to efficiently measure...
详细信息
The proliferation of video content on the Web makes similarity detection an indispensable tool in Web data management, searching, and navigation. In this paper, we propose a number of algorithms to efficiently measure video similarity. We define video as a set of frames, which are represented as high dimensional vectors in a feature space. Our goal is to measure ideal video similarity (IVS), defined as the percentage of clusters of similar frames shared between two video sequences. Since IVS is too complex to be deployed in large database applications, we approximate it with Voronoi video similarity (VVS), defined as the volume of the intersection between Voronoi cells of similar clusters. We propose a class of randomized algorithms to estimate VVS by first summarizing each video with a small set of its sampled frames, called the video signature (ViSig), and then calculating the distances between corresponding frames from the two ViSigs. By generating samples with a probability distribution that describes the video statistics, and ranking them based upon their likelihood of making an error in the estimation, we show analytically that ViSig can provide an unbiased estimate of IVS. Experimental results on a large dataset of Web video and a set of MPEG-7 test sequences with artificially generated similar versions are provided to demonstrate the retrieval performance of our proposed techniques.
In this paper, we develop a decentralized probabilistic method for performance optimization of cloud services. We focus on Infrastructure-as-a-Service where the user is provided with the ability of configuring virtual...
详细信息
In this paper, we develop a decentralized probabilistic method for performance optimization of cloud services. We focus on Infrastructure-as-a-Service where the user is provided with the ability of configuring virtual resources on demand in order to satisfy specific computational requirements. This novel approach is strongly supported by a theoretical framework based on tail probabilities and sample complexity analysis. It allows not only the inclusion of performance metrics for the cloud but the incorporation of security metrics based on cryptographic algorithms for data storage. To the best of the authors' knowledge this is the first unified approach to provision performance and security on demand subject to the Service Level Agreement between the client and the cloud service provider. The quality of the service is guaranteed given certain values of accuracy and confidence. We present some experimental results using the Amazon Web Services, Amazon Elastic Compute Cloud service to validate our probabilistic optimization method.
We present randUBV, a randomized algorithm for matrix sketching based on the block Lanzcos bidiagonalization process. Given a matrix A, it produces a low-rank approximation of the form UBVT, where U and V have orthono...
详细信息
We present randUBV, a randomized algorithm for matrix sketching based on the block Lanzcos bidiagonalization process. Given a matrix A, it produces a low-rank approximation of the form UBVT, where U and V have orthonormal columns in exact arithmetic and B is block bidiagonal. In finite precision, the columns of both U and V will be close to orthonormal. Our algorithm is closely related to the randQB algorithms of Yu, Gu, and Li [SIAM J. Matrix Anal. Appl., 39 (2018), pp. 1339-1359]. in that the entries of B are incrementally generated and the Frobenius norm approximation error may be efficiently estimated. It is therefore suitable for the fixed-accuracy problem and so is designed to terminate as soon as a user input error tolerance is reached. Numerical experiments suggest that the block Lanczos method is generally competitive with or superior to algorithms that use power iteration, even when A has significant clusters of singular values.
A fundamental problem when adding column pivoting to the Householder QR factorization is that only about half of the computation can be cast in terms of high performing matrix matrix multiplications, which greatly lim...
详细信息
A fundamental problem when adding column pivoting to the Householder QR factorization is that only about half of the computation can be cast in terms of high performing matrix matrix multiplications, which greatly limits the benefits that can be derived from so-called blocking of algorithms. This paper describes a technique for selecting groups of pivot vectors by means of randomized projections. It is demonstrated that the asymptotic flop count for the proposed method is 2mn(2) (2/3)n(3) for an m x n matrix, identical to that of the best classical unblocked Householder QR factorization algorithm (with or without pivoting). Experiments demonstrate acceleration in speed of close to an order of magnitude relative to the GEQP3 function in LAPACK, when executed on a modern CPU with multiple cores. Further, experiments demonstrate that the quality of the randomized pivot selection strategy is roughly the same as that of classical column pivoting. The described algorithm is made available under open source license and can be used with LAPACK or libflame.
Product recommendation is one of the most important services in the Internet. In this paper, we consider a product recommendation system which recommends products to a group of users. The recommendation system only ha...
详细信息
Product recommendation is one of the most important services in the Internet. In this paper, we consider a product recommendation system which recommends products to a group of users. The recommendation system only has partial preference information on this group of users: a user only indicates his preference to a small subset of products in the form of ratings. This partial preference information makes it a challenge to produce an accurate recommendation. In this work, we explore a number of fundamental questions. What is the desired number of ratings per product so to guarantee an accurate recommendation? What are some effective voting rules in summarizing ratings? How users' misbehavior such as cheating, in product rating may affect the recommendation accuracy? What are some efficient rating schemes? To answer these questions, we present a formal mathematical model of a group recommendation system. We formally analyze the model. Through this analysis we gain the insight to develop a randomized algorithm which is both computationally efficient and asymptotically accurate in evaluating the recommendation accuracy under a very general setting. We propose a novel and efficient heterogeneous rating scheme which requires equal or less rating workload, but can improve over a homogeneous rating scheme by as much as 30%. We carry out experiments on both synthetic data and real-world data from TripAdvisor. Not only we validate our model, but also we obtain a number of interesting observations, i.e., a small of misbehaving users can decrease the recommendation accuracy remarkably. For TripAdvisor, one hundred ratings per product is sufficient to guarantee a high accuracy recommendation. We believe our model and methodology are important building blocks to refine and improve applications of group recommendation systems. (C) 2014 Published by Elsevier B.V.
We investigate the problem of matrix completion with capped nuclear norm regularization. Different from most existing regularizations that minimize all the singular values simultaneously, capped nuclear norm only pena...
详细信息
We investigate the problem of matrix completion with capped nuclear norm regularization. Different from most existing regularizations that minimize all the singular values simultaneously, capped nuclear norm only penalties the singular values smaller than certain threshold. Due to its non-smoothness and non-convexity, by formulating with Majorization Minimization (MM) approach, we develop a fast Majorized Proximal Minimization Impute (MPM-Impute) algorithm. At each iteration, the sub-problem is relaxed to a surrogate (upper bound) function and solved via proximal minimization with closed form solution. Though it requires singular value decompositions (SVD) at each iteration, by incorporating with the randomized algorithm, we propose the randomized Truncated Singular Value Thresholding (RTSVT) operator to lower the computational cost. In addition, in contrast with most MM approaches, our algorithm is guaranteed to converge to the stationary points. Experimental results on synthetic data, image inpainting show that the completion results exceed or achieve comparable performance than state-of-the-art, yet several times faster. (C) 2018 Published by Elsevier B.V.
暂无评论