Submodular Cover problem has attracted the attention of researchers because of its wide variety of applications in economics, machine learning, digital marketing, and computer science. Previous studies on this problem...
详细信息
ISBN:
(纸本)9781665404358
Submodular Cover problem has attracted the attention of researchers because of its wide variety of applications in economics, machine learning, digital marketing, and computer science. Previous studies on this problem have focused on solving it under the assumption in a non-noise environment, or using the greedy algorithm to solve under noise. However, in some applications, the data is often large scale and brings the noisy version, so the effectiveness of existing solutions is low or not applicable in large and noisy data. Motivated by this phenomenon, we study the Submodular Cover under Noise (SCN) problem and propose a single pass streaming algorithm, which provides a bicriteria approximation solution for SCN. The experiment results indicate that our algorithm provides solutions with the high value of objective functions and outperforms the-state-of-art algorithm in terms of both number of queries and running time.
Given an undirected graph G on n nodes and rn edges in the form of a data stream we study the problem of finding an Euler tour in G. Our main result is the first one-pass streaming algorithm computing an Euler tour of...
详细信息
Given an undirected graph G on n nodes and rn edges in the form of a data stream we study the problem of finding an Euler tour in G. Our main result is the first one-pass streaming algorithm computing an Euler tour of G in the form of an edge successor function with only O (n log (n)) RAM, which is optimal for this setting (e.g. Sun and Woodruff (2015)). Since the output size can be much larger, we use a write-only tape to gradually output the solution. The previously best-known result for finding Euler tours in data streams is implicitly given by the W-stream algorithm of Demetrescu et al. (2010) using O (m / n) passes under the same RAM limitation. Our approach is to partition the edges into edge-disjoint cycles and to merge the cycles until a single Euler tour is achieved. In the streaming environment such a merging is far from being obvious as the limited RAM allows the processing of only a constant number of cycles at once. This enforces merging of cycles that partially are no longer present in RAM. We solve this problem with a new edge swapping technique, for which storing two certain edges per node is sufficient to merge tours without having all tour edges in RAM. The mathematical key is to model tours and their merging in an algebraic way, where certain equivalence classes represent subtours. This quite general approach might be of interest also in other routing problems.
Submodular optimization problem has been concerned in recent years. The problem of maximizing submodular and non-submodular functions on the integer lattice has received a lot of recent attention. In this paper, we st...
详细信息
Submodular optimization problem has been concerned in recent years. The problem of maximizing submodular and non-submodular functions on the integer lattice has received a lot of recent attention. In this paper, we study streaming algorithms for the problem of maximizing a monotone non-submodular functions with cardinality constraint on the integer lattice. For a monotone non-submodular function f:Z(n)(+)-> R+ defined on the integer lattice with diminishing-return (DR) ratio gamma , we present a one pass streaming algorithm that gives a (1-12(gamma)-epsilon)\-approximation, requires at most O(k epsilon-1logk/gamma)\space and O(epsilon-1logk/*** ||B||(infinity)) . update time per element. We then modify the algorithm and improve the memory complexity to O(k/gamma(epsilon)) . To the best of our knowledge, this is the first streaming algorithm on the integer lattice for this constrained maximization problem.
A basic assumption of Health Level Seven (HL7) protocol is 'No limitation of message length'. However, most existing commercial HL7 interface engines do limit message length because they use the string array m...
详细信息
A basic assumption of Health Level Seven (HL7) protocol is 'No limitation of message length'. However, most existing commercial HL7 interface engines do limit message length because they use the string array method, which is run in the main memory for the HL7 message parsing process. Specifically, messages with image and multi-media data create a tong string array and thus cause the computer system to raise critical and fatal problem. Consequently, HL7 messages cannot handle the image and multi-media data necessary in modern medical records. This study aims to solve this problem with the 'streaming algorithm' method. This new method for HL7 message parsing applies the character-stream object which process character by character between the main memory and. hard disk device with the consequence that the processing load on main memory could be alleviated. The main functions of this new engine are generating, parsing, validating, browsing, sending, and receiving HL7 messages. Also, the engine can parse and generate XML-formatted HL7 messages. This new HL7 engine successfully exchanged HL7 messages with 10 megabyte size images and discharge summary information between two university hospitals. Published by Elsevier Ireland Ltd.
The frequency distribution of k-mers (substrings of length k in a DNA/RNA sequence) is very useful for many bioinformatics applications that use next-generation sequencing (NGS) data. Some examples of these include de...
详细信息
ISBN:
(纸本)9781450357944
The frequency distribution of k-mers (substrings of length k in a DNA/RNA sequence) is very useful for many bioinformatics applications that use next-generation sequencing (NGS) data. Some examples of these include de Bruijn graph based assembly, read error correction, genome size prediction, and digital normalization. In developing tools for such applications, counting (or estimating) k-mers with low frequency is a pre-processing phase. However, computing k-mer frequency histogram becomes computationally challenging for large-scale genomic data. We present KmerEstimate, a streaming algorithm that approximates the count of k-mers with a given frequency in a genomic data set. Our algorithm is based on a well known adaptive sampling based streaming algorithm due to Bar-Yossef et al. for approximating distinct elements in a data stream. We implemented and tested our algorithm on several data sets. The results of our algorithm are better than that of other streaming approaches used so far for this problem (notably ntCard, the state-of-the-art streaming approach) and is within 0.6% error rate. It uses less memory than ntCard as the sample size is almost 85% less than that of ntCard. In addition, our algorithm has provable approximation and space usage guarantees. We also show certain space complexity lower bounds. The source code of our algorithm is available at https://***/srbehera11/KmerEstimate.
Due to its broad applications, maximizing a diminishing return submodular function with a knapsack constraint has been extensively studied recently. In the paper, we mainly consider this problem on the integer lattice...
详细信息
Due to its broad applications, maximizing a diminishing return submodular function with a knapsack constraint has been extensively studied recently. In the paper, we mainly consider this problem on the integer lattice. Assuming the optimal value is known, we first design a one-pass algorithm and prove that its approximation ratio is (1/3 - epsilon). Observing the difficult of actually knowing the optimal value, we design a streaming algorithm with two passes, where in the first round we find the maximum value of the unit vector to estimate the range of the OPT. Furthermore, in order to improve the performance of the algorithm, we design an online algorithm called DynamicMRT to reduce the number of rounds, eventually achieving an approximation ratio (1/3 - epsilon), a memory complexity O(K log K\epsilon) and query complexity O(log(2) K\epsilon) per element for the knapsack constraint K.
In the article, we devise streaming algorithms for maximization of a monotone submodular function subject to a cardinality constraint on the integer lattice. Based on the observation that lattice submodularity is not ...
详细信息
In the article, we devise streaming algorithms for maximization of a monotone submodular function subject to a cardinality constraint on the integer lattice. Based on the observation that lattice submodularity is not equivalent to diminishing return submodularity on the integer lattice but rather a weaker condition, we propose a one-pass streaming algorithm with a modified binary search as subroutine of each step. Finally, we show that the algorithm is with approximation ratio 1/2-epsilon, memory complexity O(epsilon-1klogk), and per-element query complexity O(epsilon-2log2k).
The Submodular Cover (SC) problem has attracted the attention of researchers because of its wide variety of applications in many domains. Previous studies on this problem have focused on solving it under the assumptio...
详细信息
The Submodular Cover (SC) problem has attracted the attention of researchers because of its wide variety of applications in many domains. Previous studies on this problem have focused on solving it under the assumption of a non-noise environment or using the greedy algorithm to solve it under noise. However, in some applications, the data is often large-scale and brings a noisy version, so the existing solutions are ineffective or not applicable to large and noisy data. Motivated by this phenomenon, we study the Submodular Cover under Noises (SCN) problem and propose two efficient streaming algorithms, which provide a solution with theoretical bounds under two common noise models, multiplicative and additive noises. The experimental results indicate that our proposed algorithms not only provide the solution with a high objective function value but also outperform the state-of-the-art algorithm in terms of both the number of queries and the running time.
Many practical problems emphasize the importance of not only knowing whether an element is selectedbut also deciding to what extent it is selected,which imposes a challenge on submodule *** this study,we consider the ...
详细信息
Many practical problems emphasize the importance of not only knowing whether an element is selectedbut also deciding to what extent it is selected,which imposes a challenge on submodule *** this study,we consider the monotone,nondecreasing,and non-submodular maximization on the integer lattice with a *** first design a two-pass streaming algorithm by refining the estimation interval of the optimal *** element,the algorithm not only decides whether to save the element but also gives the number of ***,we introduce the binary search as a subroutine to reduce the time ***,we obtain a one-passstreaming algorithm by dynamically updating the estimation interval of optimal ***,we improve the memorycomplexity of this algorithm.
In this paper, we study the problem of maximizing the Difference of two Submodular (DS) functions in the streaming model, where elements in the ground set arrive one at a time in an arbitrary order. We present one-pas...
详细信息
In this paper, we study the problem of maximizing the Difference of two Submodular (DS) functions in the streaming model, where elements in the ground set arrive one at a time in an arbitrary order. We present one-pass streaming algorithms for both the unconstrained and cardinality-constrained problems. Our analysis shows that the algorithms we propose are able to produce solutions with provable approximation guarantees. To the best of our knowledge, this is the first theoretical guarantee for the DS maximization problem in the streaming model. In addition, we study the function maximization problem under a cardinality constraint, where the underlying objective function is a gamma-weakly DR-submodular function, in the streaming setting. We propose a one-pass streaming algorithm, which achieves an approximation ratio of gamma(1 + gamma) - epsilon . Since the sum of suBmodular and suPermodular (BP) functions can be regarded as a (1 - K-g)-weakly DR-submodular function, we obtain a ((1 - K-g)/(2 - K-g) - e)-approximation for the cardinality-constrained BP maximization, where K-g is the curvature of the corresponding supermodular function. Our results improve the previous best approximation bounds.
暂无评论