The closest string problem and the closest substring problem are all natural theoretical computer science problems and find important applications in computational biology. Given n input strings, the closest string ( ...
详细信息
The closest string problem and the closest substring problem are all natural theoretical computer science problems and find important applications in computational biology. Given n input strings, the closest string ( substring) problem finds a new string within distance d to ( a substring of) each input string and such that d is minimized. Both problems are NP-complete. In this paper we propose new algorithms for these two problems. For the closest string problem, we developed an exact algorithm with time complexity O(n vertical bar Sigma vertical bar(O(d))), where Sigma is the alphabet. This improves the previously best known result O(nd(O(d))) and results into a polynomial time algorithm when d = O( log n). By using this algorithm, a polynomial time approximation scheme (PTAS) for the closest string problem is also given with time complexity O(n(O)(epsilon(-2))), improving the previously best known O(n(O)(epsilon (2) log 1/epsilon)) PTAS. A new algorithm for the closest substring problem is also proposed. Finally, we prove that a restricted version of the closest substring problem has the same parameterized complexity as the closest substring, answering an open question in the literature.
Recently, with the rapid increase in the number of web services, QoS-aware Web Service Composition(QWSC) has become a popular topic in both industry and academia. Meta-heuristic algorithm, as an effective way to solve...
详细信息
Recently, with the rapid increase in the number of web services, QoS-aware Web Service Composition(QWSC) has become a popular topic in both industry and academia. Meta-heuristic algorithm, as an effective way to solve classical optimization problems, has been successfully applied to QWSC nowdays. However, such approach has intrinsic drawbacks and usually lack of good performance in large-scale scenarios. For example, some meta-heuristic algorithms are suitable for continuous search space, while the search space of QWSC is discrete. For solving those problems which were commonly faced when applying meta-heuristic algorithm on QWSC, in this research, we firstly introduce a preprocessing approach for constructing fuzzy continuous neighborhood relations of concrete services, which makes the local search strategy of meta-heuristic algorithms be as effective in discrete space as in continuous space, thus improving the optimization performance. Second, we combine Harris Hawks Optimization (HHO) algorithm and logical chaotic sequence to propose an improved meta-heuristic algorithm named CHHO for solving QWSC. The ergodic and chaotic characteristics of chaotic sequences are used to enhance the ability of the CHHO to jump out of the local optimum for further optimization. Experimental results show that the CHHO has better optimization performance by comparing with the existing mainstream algorithms when solving QWSC problems. Additionally, the preprocessing approach not only greatly improves the optimization performance of the CHHO but also can be freely utilized in other meta-heuristics based approaches.
We study the problem of graph summarization. Given a large graph we aim at producing a concise lossy representation (a summary) that can be stored in main memory and used to approximately answer queries about the orig...
详细信息
We study the problem of graph summarization. Given a large graph we aim at producing a concise lossy representation (a summary) that can be stored in main memory and used to approximately answer queries about the original graph much faster than by using the exact representation. In this work we study a very natural type of summary: the original set of vertices is partitioned into a small number of supernodes connected by superedges to form a complete weighted graph. The superedge weights are the edge densities between vertices in the corresponding supernodes. To quantify the dissimilarity between the original graph and a summary, we adopt the reconstruction error and the cut-norm error. By exposing a connection between graph summarization and geometric clustering problems (i.e., k-means and k-median), we develop the first polynomial-time approximation algorithms to compute the best possible summary of a certain size under both measures. We discuss how to use our summaries to store a (lossy or lossless) compressed graph representation and to approximately answer a large class of queries about the original graph, including adjacency, degree, eigenvector centrality, and triangle and subgraph counting. Using the summary to answer queries is very efficient as the running time to compute the answer depends on the number of supernodes in the summary, rather than the number of nodes in the original graph.
A linear algorithm for two-dimensional (2-D) least square (LS) approximation in the frequency domain is presented. The algorithm is based on the equation error model. The approximation yields a 2-D rational function i...
详细信息
A linear algorithm for two-dimensional (2-D) least square (LS) approximation in the frequency domain is presented. The algorithm is based on the equation error model. The approximation yields a 2-D rational function in the complex variables, or equivalently a 2-D autoregressive, moving-average (ARMA) process. The proposed two-dimensional, least square, frequency domain (2D-LS-FD) algorithm can efficiently represent 2-D signals or images. It is also capable of accurately modeling 2-D linear and shift invariant (LSI) stable systems, when the model has a sufficient order relative to the unknown and the identification noise is negligible. This paper will also discuss, with proofs, the important existence, uniqueness and convergence properties associated with this technique. Simulation examples for signal and system modeling are given to show the excellent performance of the algorithm. In addition, the successful application of the developed algorithm to image noise cancellation is also presented.
The emerging network function virtualization is migrating traditional middleboxes, e.g., firewalls, load balancers, proxies, from dedicated hardware to virtual network functions (VNFs) running on commercial servers de...
详细信息
The emerging network function virtualization is migrating traditional middleboxes, e.g., firewalls, load balancers, proxies, from dedicated hardware to virtual network functions (VNFs) running on commercial servers defined as network points of presence (N-PoPs). VNFs further chain up for more complex network services called service function chains (SFCs). SFCs introduce new flexibility and scalability which greatly reduce expenses and rolling out time of network services. However, chasing the lowest cost may lead to congestion on popular N-PoPs and links, thus resulting in performance degradation or violation of service-level agreements. To address this problem, we propose a novel scheme that reduces the operating cost and controls network congestion at the same time. It does so by placing VNFs and routing flows among them jointly. Given the problem is NP-hard, we design an approximation algorithm named candidate path selection (CPS) with a theoretical performance guarantee. We then consider cases when SFC demands fluctuate frequently. We propose an online candidate path selection (OCPS) algorithm to handle such cases considering the VNF migration cost. OCPS is designed to preserve good performance under various migration costs and prediction errors. Extensive simulation results highlight that CPS and OCPS algorithms perform better than baselines and comparably to the optimal solution.
Kernel-basedlearning has well-documented merits in various machine learning tasks. Most of the kernel-based learning approaches rely on a pre-selected kernel, the choice of which presumes task-specific prior informati...
详细信息
Kernel-basedlearning has well-documented merits in various machine learning tasks. Most of the kernel-based learning approaches rely on a pre-selected kernel, the choice of which presumes task-specific prior information. In addition, most existing frameworks assume that data are collected centrally at batch. Such a setting may not be feasible especially for large-scale data sets that are collected sequentially over a network. To cope with these challenges, the present work develops an online multi-kernel learning scheme to infer the intended nonlinear function 'on the fly' from data samples that are collected in distributed locations. To address communication efficiency among distributed nodes, we study the effects of quantization and develop a distributed and quantized online multiple kernel learning algorithm. We provide regret analysis that indicates our algorithm is capable of achieving sublinear regret. Numerical tests on real datasets show the effectiveness of our algorithm.
作者:
SAAB, YGDept. of Comput. Sci.
Missouri Univ. Columbia MO USA Abstract Authors References Cited By Keywords Metrics Similar Download Citation Email Print Request Permissions
Partitioning is a fundamental problem in diverse fields of study such as pattern recognition, parallel processing, and the design of VLSI circuits, Recently, node clustering or compaction has been shown to enhance the...
详细信息
Partitioning is a fundamental problem in diverse fields of study such as pattern recognition, parallel processing, and the design of VLSI circuits, Recently, node clustering or compaction has been shown to enhance the performance of iterative partitioning algorithms by several authors, However, clustering has been mainly used as a preprocessing step before partitioning in existing methods, This paper describes a technique to extract clusters using information collected during a pass of an iterative exchange algorithm. Alternative approaches for the implementation of this new clustering technique are discussed, and one such approach is chosen to be incorporated in a modified Fiduccia-Mattheyses algorithm based on a tradeoff between run time and performance, The resulting algorithm, BISECT, performs well in comparison with variants of the Kernighan-Lin algorithm including the Fiduccia-Mattheyses algorithm, local approaches, and simulated annealing on a wide variety of real and randomly generated benchmarks, BISECT is also used to find small vertex separators, and its results are compared with previous methods on several benchmarks, The empirical results show that BISECT is stable and is not very sensitive to the initial partition, Under suitably mild assumptions, BISECT can be shown to run in linear time, The empirical results confirm the speed of BISECT which can partition very large graphs (12,598 nodes and 91,961 edges) in less than six minutes of CPU time on a Sun Spare 1+ workstation.
The robustness of filter bank multi-carrier (FBMC) against delays is appealing for asynchronous grant-free massive connectivity systems. In this paper, we study the joint activity detection, delay and channel estimati...
详细信息
The robustness of filter bank multi-carrier (FBMC) against delays is appealing for asynchronous grant-free massive connectivity systems. In this paper, we study the joint activity detection, delay and channel estimation (JADDCE) problem for filter bank multi-carrier (FBMC)-based uplink massive connectivity with asynchronous transmission and frequency-selective fading (FSF) channels. We formulate JADDCE as a compressed sensing (CS) problem to fully exploit the sparsity structure and propose an efficient algorithm based on generalized approximate message passing (GAMP) to solve it. Besides, since parameters such as noise variance and activity probability may not be perfectly known by the receiver, we introduce the expectation maximization (EM) method into the algorithm and derive the updating rules of the unknown parameters. We also utilize the analysis framework based on average mutual information (AMI) to find theoretical upper-bound of the channel estimation performance. Simulation results show satisfying detection and estimation performance of the proposed algorithm under FSF channels. Besides, the channel estimation performance can approach the theoretical upper-bound.
With the rapid development of the Internet of Things (IoT), large quantities of data have been generated. Due to the limitation of the network bandwidth, the time and energy consumption of data transmission are increa...
详细信息
With the rapid development of the Internet of Things (IoT), large quantities of data have been generated. Due to the limitation of the network bandwidth, the time and energy consumption of data transmission are increased. Data feature information can be extracted in real-time by the deployment of a data processing center. In this article, a novel dimension reduction approach is proposed in edge computing. First, a four-layer data processing framework is designed for data acquisition. A task assignment algorithm (TAA) is used for the condition when the edge node stops working due to an accident. Second, a threshold strategy is proposed to filter the data and reduce the dimension. Finally, the dimension reduction algorithm based on adaptive maximum linear neighborhood selection (AMLNS) is proposed. The harmonic geodesic distance is introduced to avoid the deformation of the manifold structure in AMLNS algorithm. Particularly, multiple weights are used to construct linear structure, which has a better embedding effect than single weight. The maximum linear neighborhood error weight is used to calculate the data coordinates. Experimental results show that the TAA improves the task completion rate about 15% and 36% over the random assignment method in mobile layer and edge layer, respectively. Compared with the local linear embedding (LLE), the points distribution of AMLNS is more uniform and regular, the execution time of AMLNS is reduced by about 17%. Furthermore, the embedding errors are less than those of LLE.
The acquired industrial data often contain missing outputs because of the irregularities of complicated industrial environment, which make the outputs of the training dataset incomplete. In this paper, a semi-supervis...
详细信息
The acquired industrial data often contain missing outputs because of the irregularities of complicated industrial environment, which make the outputs of the training dataset incomplete. In this paper, a semi-supervised sparse Bayesian regression model is proposed for dealing with the incomplete outputs problem by employing a variational inference technique. Within the settings of specific hierarchical priors over the missing outputs, in this paper, we derive the posterior probability distribution over the uncertain variables including the missing outputs. Given that the posterior distribution is not analytically tractable, a hybrid learning procedure is designed for combining the variational inference with a gradient-based method to obtain optimal approximate posteriors. To verify the performance of the proposed method, a number of comparative experiments are conducted and analyzed by using the datasets (including artificial and real world ones) coming with different proportions of missing outputs. Compared to the existing semi-supervised regression approaches, we demonstrated the effectiveness of the proposed method.
暂无评论