This paper presents an interesting relation between the Maximum Agreement Subtree (MAST) problem and the Tree Edit Distance (TED) problem, both of which have been intensively studied in the literature. To be specific,...
详细信息
This paper presents an interesting relation between the Maximum Agreement Subtree (MAST) problem and the Tree Edit Distance (TED) problem, both of which have been intensively studied in the literature. To be specific, we show that, for an arbitrary tree edit distance metric that is a derivative of the Tai tree edit distance metric, there exists a MAST-like pattern extraction problem, named Mostly Adjusted Sub-Forest (MASF) problem, such that computing a distance between trees x and y is equivalent to finding an optimal pattern shared between x and y. The MASF problem is different from the MAST problem in: (1) A pattern of the MASF problem may be a forest instead of a tree;(2) Instead of requiring exact match of labels, a pattern of a MASF problem is multi-labeled, and our flexible matching rule only requires that the label set of a vertex of a pattern includes the label of the corresponding vertex in a target tree;(3) To control ambiguity of matching caused by the flexible rule, the objective function includes a penalty function. As an application of this generic framework, we equate the Lu* tree edit distance metric with a pattern extraction problem, named the Mostly Adjusted Agreement Sub-Tree (MAAST) problem. The MAAST problem aims to find optimal agreement subtrees under the flexible matching rule and is solved in O(n(2)root dlogd)-time,where n and d are the size and the minimum degree of the input trees. The MAST problem requires O(n(2.5))-time computation. (C) 2014 Elsevier B.V. All rights reserved.
Soft-constraint semi-supervised affinity propagation (SCSSAP) adds supervision to the affinity propagation (AP) clustering algorithm without strictly enforcing instance-level constraints. Constraint violations lead to...
详细信息
Soft-constraint semi-supervised affinity propagation (SCSSAP) adds supervision to the affinity propagation (AP) clustering algorithm without strictly enforcing instance-level constraints. Constraint violations lead to an adjustment of the AP similarity matrix at every iteration of the proposed algorithm and to addition of a penalty to the objective function. This formulation is particularly advantageous in the presence of noisy labels or noisy constraints since the penalty parameter of SCSSAP can be tuned to express our confidence in instance-level constraints. When the constraints are noiseless, SCSSAP outperforms unsupervised AP and performs at least as well as the previously proposed semi-supervised AP and constrained expectation maximization. In the presence of label and constraint noise, SCSSAP results in a more accurate clustering than either of the aforementioned established algorithms. Finally, we present an extension of SCSSAP which incorporates metric learning in the optimization objective and can further improve the performance of clustering.
We describe a simple linear time algorithm to construct a quasi-kernel in a digraph and to find three quasi-kernels in digraphs without a kernel (giving constructive proofs of known results of Chvatal and Lovasz, or J...
详细信息
We describe a simple linear time algorithm to construct a quasi-kernel in a digraph and to find three quasi-kernels in digraphs without a kernel (giving constructive proofs of known results of Chvatal and Lovasz, or Jacob and Meyniel). However, we show that it is NP-complete to decide if there is a quasi-kernel containing a specified vertex in a given digraph. The algorithm provides also a simple proof of the characterization of digraphs with only two quasi-kernels given by Gutin, Koh, Tay and Yeo. (C) 2015 Elsevier B.V. All rights reserved.
A clique of a graph G is defined as a complete subgraph maximal under inclusion and having at least two vertices. A clique-transversal set D of G is a subset of vertices of G such that D meets all cliques of G. The cl...
详细信息
A clique of a graph G is defined as a complete subgraph maximal under inclusion and having at least two vertices. A clique-transversal set D of G is a subset of vertices of G such that D meets all cliques of G. The clique-transversal set problem is to find a minimum clique-transversal set of G. In this paper we present a polynomial time algorithm for the clique-transversal set problem on claw-free graphs with degree at most 4. (C) 2014 Elsevier B.V. All rights reserved.
We give an algorithm that computes the pathwidth of a given directed graph D in 3(tau(D))n(0(1)) time where n is the number of vertices of D and tau(D) is the number of vertices of a minimum vertex cover of the underl...
详细信息
We give an algorithm that computes the pathwidth of a given directed graph D in 3(tau(D))n(0(1)) time where n is the number of vertices of D and tau(D) is the number of vertices of a minimum vertex cover of the underlying undirected graph of D. This result extends that of [5] for undirected graphs to directed graphs. Moreover, our algorithm is based on a standard dynamic programming with a simple tree-pruning trick, which is extremely simple and easy to implement. An additional advantage of our algorithm is that a minimum vertex cover appears in the analysis but not in the algorithm itself, in contrast to the algorithm of [5] which needs to compute a minimum vertex cover. (C) 2014 Elsevier B.V. All rights reserved.
For a vertex v is an element of V (G), the incidence neighborhood of v, denoted by IN(v), is the set {(v, e):e is an element of E(G) and v is incident with e} boolean OR {(u, e) : e = vu is an element of E(G)}. Let S-...
详细信息
For a vertex v is an element of V (G), the incidence neighborhood of v, denoted by IN(v), is the set {(v, e):e is an element of E(G) and v is incident with e} boolean OR {(u, e) : e = vu is an element of E(G)}. Let S-sigma (v) denote the set of colors assigned to IN(v) in G under incidence coloring sigma and s(sigma) = max{vertical bar S-sigma (v)vertical bar : v is an element of V(G)}. Let G(1)square G(2) denote the Cartesian product of graphs G(1) and G(2). Let cri be an incidence coloring of graph G(i) and n(sigma(i)) the number of colors used by sigma(i) for i is an element of {1, 2}. In this paper, we show that if n(sigma(1)) >= n(sigma(2)) - s(sigma(2)), then there exists an incidence coloring of G(1)square G(2) which uses n(sigma(1)) + s(sigma(2)) colors;otherwise, there exists an incidence coloring of G(1)square G(2) using n(sigma(2)) colors. Based on the result above, we settle the following conjecture in affirmative: For integer p >= 1, chi(i)(Q(n)) = {n + 1 if n = 2(p) - 1 n + 2 otherwise, where Q(n) is the n-dimensional hypercube and chi(i)(Q(n)) is the incidence coloring number of Q(n). (C) 2015 Elsevier B.V. All rights reserved.
Cell differentiation processes are achieved through a continuum of hierarchical intermediate cell states that might be captured by single-cell RNA seq. Existing computational approaches for the assessment of cell-stat...
详细信息
Cell differentiation processes are achieved through a continuum of hierarchical intermediate cell states that might be captured by single-cell RNA seq. Existing computational approaches for the assessment of cell-state hierarchies from single-cell data can be formalized under a general framework composed of (i) a metric to assess cell-to-cell similarities (with or without a dimensionality reduction step) and (ii) a graph-building algorithm (optionally making use of a cell clustering step). The Sincell R package implements a methodological toolbox allowing flexible workflows under such a framework. Furthermore, Sincell contributes new algorithms to provide cell-state hierarchies with statistical support while accounting for stochastic factors in single-cell RNA seq. graphical representations and functional association tests are provided to interpret hierarchies. The functionalities of Sincell are illustrated in a real case study, which demonstrates its ability to discriminate noisy from stable cell-state hierarchies.
We consider a model of user engagement in social networks, where each player incurs a cost to remain engaged but derives a benefit proportional to the number of engaged neighbors. The natural equilibrium of this model...
详细信息
We consider a model of user engagement in social networks, where each player incurs a cost to remain engaged but derives a benefit proportional to the number of engaged neighbors. The natural equilibrium of this model corresponds to the k-core of the social network-the maximal induced subgraph with minimum degree at least k. We introduce the problem of "anchoring" a small number of vertices to maximize the size of the corresponding anchored k-core-the maximal induced subgraph in which every nonanchored vertex has degree at least k. This problem corresponds to preventing "unraveling"-a cascade of iterated withdrawals-and it identifies the individuals whose participation is most crucial to the overall health of a social network. We classify the computational complexity of this problem as a function of k and of the graph structure. We provide polynomial-time algorithms for general graphs with k = 2 and for bounded-treewidth graphs with arbitrary k. We prove strong inapproximability results for general graphs and k >= 3.
We propose introducing modern parallel programming paradigms to secure computation, enabling their secure execution on large datasets. To address this challenge, we present graphSC, a framework that (i) provides a pro...
详细信息
ISBN:
(纸本)9781467369497
We propose introducing modern parallel programming paradigms to secure computation, enabling their secure execution on large datasets. To address this challenge, we present graphSC, a framework that (i) provides a programming paradigm that allows non-cryptography experts to write secure code;(ii) brings parallelism to such secure implementations;and (iii) meets the needs for obliviousness, thereby not leaking any private information. Using graphSC, developers can efficiently implement an oblivious version of graph-based algorithms (including sophisticated data mining and machine learning algorithms) that execute in parallel with minimal communication overhead. Importantly, our secure version of graph-based algorithms incurs a small logarithmic overhead in comparison with the non-secure parallel version. We build graphSC and demonstrate, using several algorithms as examples, that secure computation can be brought into the realm of practicality for big data analysis. Our secure matrix factorization implementation can process 1 million ratings in 13 hours, which is a multiple order-of-magnitude improvement over the only other existing attempt, which requires 3 hours to process 16K ratings.
We tackle the problem of counting the number qk of k-cliques in large-scale graphs, for any constant k ≥ 3. Clique counting is essential in a variety of applications, including social network analysis. Our algorithms...
详细信息
We tackle the problem of counting the number qk of k-cliques in large-scale graphs, for any constant k ≥ 3. Clique counting is essential in a variety of applications, including social network analysis. Our algorithms make it possible to compute qk for several real-world graphs and shed light on its growth rate as a function of k. Even for small values of k, the number qk of k-cliques can be in the order of tens or hundreds of trillions. As k increases, different graph instances show different behaviors: while on some graphs qk+1 < qk, on other benchmarks qk+1 蠑 qk, up to two orders of magnitude in our observations. graphs with steep clique growth rates represent particularly tough instances in practice. Due to the computationally intensive nature of the clique counting problem, we settle for parallel solutions in theMapReduce framework, which has become in the last few years a de facto standard for batch processing of massive datasets. We give both theoretical and experimental contributions. On the theory side, we design the first exact scalable algorithm for counting (and listing) k-cliques in MapReduce. Our algorithm uses O(m3/2) total space and O(mk/2) work, where m is the number of graph edges. This matches the best-known bounds for triangle listing when k = 3 and is work optimal in the worst case for any k, while keeping the communication cost independent of *** also design sampling-based estimators that can dramatically reduce the running time and space requirements of the exact approach, while providing very accurate solutions with high probability. We then assess the effectiveness of different clique counting approaches through an extensive experimental analysis over the Amazon EC2 platform, considering both our algorithms and their state-of-the-art competitors. The experimental results clearly highlight the algorithm of choice in different scenarios and prove our exact approach to be the mos
暂无评论