An important first step when deploying a wireless ad hoc network is neighbor discovery in which every node attempts to determine the set of nodes it can communicate within one wireless hop. In the recent years, cognit...
详细信息
ISBN:
(纸本)9780769543642
An important first step when deploying a wireless ad hoc network is neighbor discovery in which every node attempts to determine the set of nodes it can communicate within one wireless hop. In the recent years, cognitive radio (CR) technology has gained attention as an attractive approach to alleviate spectrum congestion. A CR transceiver can operate over a wide range of frequencies possibly spanning multiple frequency bands. A CR node can opportunistically utilize unused wireless spectrum without interference from other wireless devices in its vicinity. Due to spatial variations in frequency usage and hardware variations in radio transceivers, different nodes in the network may perceive different subsets of frequencies available to them for communication. This heterogeneity in the available channel sets across the network increases the complexity of solving the neighbor discovery problem in a CR network. In this paper, we design and analyze several randomized algorithms for neighbor discovery in such a (heterogeneous) network under a variety of assumptions.
We perform importance sampling for a randomized matrix multiplication algorithm by Drineas, Kannan, and Mahoney and derive probabilities that minimize the expected value (with regard to the distributions of the matrix...
详细信息
We perform importance sampling for a randomized matrix multiplication algorithm by Drineas, Kannan, and Mahoney and derive probabilities that minimize the expected value (with regard to the distributions of the matrix elements) of the variance. We compare these optimized probabilities with uniform probabilities and derive conditions under which the actual variance of the optimized probabilities is lower. Numerical experiments with query matching in information retrieval applications illustrate that the optimized probabilities produce more accurate matchings than the uniform probabilities and that they can also be computed efficiently.
Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends ...
详细信息
Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed-either explicitly or implicitly-to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, robustness, and/or speed. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m x n matrix. (i) For a dense input matrix, randomized algorithms require O(mnlog(k)) floating-point operations (flops) in contrast to O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multiprocessor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.
The prevalence of dynamic-content web services, exemplified by search and online social networking, has motivated an increasingly wide web-facing front end. Horizontal scaling in the Cloud is favored for its elasticit...
详细信息
The prevalence of dynamic-content web services, exemplified by search and online social networking, has motivated an increasingly wide web-facing front end. Horizontal scaling in the Cloud is favored for its elasticity, and distributed design of load balancers is highly desirable. Existing algorithms with a centralized design, such as Join-the-Shortest-Queue (JSQ), incur high communication overhead for distributed dispatchers. We propose a novel class of algorithms called Join-Idle-Queue (JIQ) for distributed load balancing in large systems. Unlike algorithms such as Power-of-Two, the JIQ algorithm incurs no communication overhead between the dispatchers and processors at job arrivals. We analyze the JIQ algorithm in the large system limit and find that it effectively results in a reduced system load, which produces 30-fold reduction in queueing overhead compared to Power-of-Two at medium to high load. An extension of the basic JIQ algorithm deals with very high loads using only local information of server load. Published by Elsevier B.V.
We consider the problem of two-coloring n-uniform hypergraphs. It is known that any such hypergraph with at most 1/10 root n/lnn(2)2(n) hyperedges can be two-colored [7]. In fact, there is an efficient (requiring poly...
详细信息
ISBN:
(纸本)9783642222993
We consider the problem of two-coloring n-uniform hypergraphs. It is known that any such hypergraph with at most 1/10 root n/lnn(2)2(n) hyperedges can be two-colored [7]. In fact, there is an efficient (requiring polynomial time in the size of the input) randomized algorithm that produces such a coloring. As stated [7], this algorithm requires random access to the hyperedge set of the input hypergraph. In this paper, we show that a variant of this algorithm can be implemented in the streaming model (with just one pass over the input), using space O(vertical bar V vertical bar B), where V is the vertex set of the hypergraph and each vertex is represented by B bits. (Note that the number of hyperedges in the hypergraph can be superpolynomial in vertical bar V vertical bar, and it is not feasible to store the entire hypergraph in memory.) We also consider the question of the minimum number of hyperedges in non-two-colorable n-uniform hypergraphs. Era's showed that there exist non-2-colorable n-uniform hypegraphs with O(n(2)2(n)) hyperedges and Theta(n(2)) vertices. We show that the choice Theta(n(2)) for the number of vertices in Erdos's construction is crucial: any hypergraph with at most 2n(2)/t vertices and 2(n) exp(t/8) hyperedges is 2-colorable. (We present a simple randomized streaming algorithm to construct the two-coloring.) Thus, for example, if the number of vertices is at most n(1.5), then any non-2-colorable hypergraph must have at least 2(n) exp(root n/8) >> n(2)2(n) hyperedges. We observe that the exponential dependence on t in our result is optimal up to constant factors.
Verbose web queries are often descriptive in nature where a term based search engine is unable to distinguish between the essential and noisy words, which can result in a drift from the user intent. We present a rando...
详细信息
ISBN:
(纸本)9781450309349
Verbose web queries are often descriptive in nature where a term based search engine is unable to distinguish between the essential and noisy words, which can result in a drift from the user intent. We present a randomized query reduction technique that builds on an earlier learning to rank based approach. The proposed technique randomly picks only a small set of samples, instead of the exponentially many sub-queries, thus being fast enough to be useful for web search engines, while still covering wide sub-query space.
For a finite, simple, undirected graph G and an integer d >= 1, a mindeg-d subgraph is a subgraph of G of minimum degree at least d. The d-girth of G, denoted g(d)(G), is the minimum size of a mindeg-d subgraph of ...
详细信息
ISBN:
(纸本)9783642183805
For a finite, simple, undirected graph G and an integer d >= 1, a mindeg-d subgraph is a subgraph of G of minimum degree at least d. The d-girth of G, denoted g(d)(G), is the minimum size of a mindeg-d subgraph of G. It is a natural generalization of the usual girth, which coincides with the 2-girth. The notion of d-girth was proposed by Erdos et al. [13, 14] and Bollobas and Brightwell [7] over 20 years ago, and studied from a purely combinatorial point of view. Since then, no new insights have appeared in the literature. Recently, first algorithmic studies of the problem have been carried out [2,4]. The current article further explores the complexity of finding a small mindeg-d subgraph of a given graph (that is, approximating its d-girth), by providing new hardness results and the first approximation algorithms in general graphs, as well as analyzing the case where G is planar.
The statistical leverage scores of a matrix A are the squared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in rec...
详细信息
The statistical leverage scores of a matrix A are the squared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recently-popular problems such as matrix completion and Nyström-based low-rank matrix approximation as well as in large-scale statistical data analysis applications more generally; moreover, they are of interest since they define the key structural nonuniformity that must be dealt with in developing fast randomized matrix algorithms. Our main result is a randomized algorithm that takes as input an arbitrary n × d matrix A, with n ≫ d, and that returns as output relative-error approximations to all n of the statistical leverage scores. The proposed algorithm runs (under assumptions on the precise values of n and d) in O(nd logn) time, as opposed to the O(nd2) time required by the naïve algorithm that involves computing an orthogonal basis for the range of A. Our analysis may be viewed in terms of computing a relative-error approximation to an underconstrained least-squares approximation problem, or, relatedly, it may be viewed as an application of Johnson-Lindenstrauss type ideas. Several practically-important extensions of our basic result are also described, including the approximation of so-called cross-leverage scores, the extension of these ideas to matrices with n ≈ d, and the extension to streaming environments.
Uncovering the DNA regulatory logic in complex organisms has been one of the important goals of modern biology in the post-genomic era. The sequencing of multiple genomes in combination with the advent of DNA microarr...
详细信息
Uncovering the DNA regulatory logic in complex organisms has been one of the important goals of modern biology in the post-genomic era. The sequencing of multiple genomes in combination with the advent of DNA microarrays and, more recently, of massively parallel high-throughput sequencing technologies has made possible the adoption of a global perspective to the inference of the regulatory rules governing the context-specific interpretation of the genetic code that complements the more focused classical experimental approaches. Extracting useful information and managing the complexity resulting from the sheer volume and the high-dimensionality of the data produced by these genomic assays has emerged as a major challenge which we attempt to address in this work by developing computational methods and tools, specifically designed for the study of the gene regulatory processes in this new global genomic context. First, we focus on the genome-wide discovery of physical interactions between regulatory sequence regions and their cognate proteins at both the DNA and RNA level. We present a motif analysis framework that leverages the genome-wide evidence for sequence-specific interactions between trans-acting factors and their preferred cis-acting regulatory regions. The utility of the proposed framework is demonstarted on DNA and RNA cross-linking high-throughput data. A second goal of this thesis is the development of scalable approaches to dimension reduction based on spectral decomposition and their application to the study of population structure in massive high-dimensional genetic data sets. We have developed computational tools and have performed theoretical and empirical analyses of their statistical properties with particular emphasis on the analysis of the individual genetic variation measured by Single Nucleotide Polymorphism (SNP) microrarrays.
Working in Winfree's abstract tile assembly model, we show that a constant-sized tile assembly system can be programmed through relative tile concentrations to build an n x n square with high probability for any s...
详细信息
Working in Winfree's abstract tile assembly model, we show that a constant-sized tile assembly system can be programmed through relative tile concentrations to build an n x n square with high probability for any sufficiently large n. This answers an open question of Kao and Schweller [Automata, Languages and Programming, Lecture Notes in Comput. Sci. 5125, Springer, Berlin, 2008, pp. 370-384], who showed how to build an approximately n x n square using tile concentration programming and asked whether the approximation could be made exact with high probability. We show how this technique can be modified to answer another question of Kao and Schweller by showing that a constant-sized tile assembly system can be programmed through tile concentrations to assemble arbitrary finite scaled shapes, which are shapes modified by replacing each point with a c x c block of points for some integer c. Furthermore, we exhibit a smooth trade-off between specifying bits of n via tile concentrations versus specifying them via hard-coded tile types, which allows tile concentration programming to be employed for specifying a fraction of the bits of "input" to a tile assembly system, under the constraint that concentrations can be specified to only a limited precision. Finally, to account for some unrealistic aspects of the tile concentration programming model, we show how to modify the construction to use only concentrations that are arbitrarily close to uniform.
暂无评论