Background: The center string (or closest string) problem is a classic computer science problem with important applications in computational biology. Given k input strings and a distance threshold d, we search for a s...
详细信息
Background: The center string (or closest string) problem is a classic computer science problem with important applications in computational biology. Given k input strings and a distance threshold d, we search for a string within Hamming distance at most d to each input string. This problem is NP complete. Results: In this paper, we focus on exact methods for the problem that are also swift in application. We first introduce data reduction techniques that allow us to infer that certain instances have no solution, or that a center string must satisfy certain conditions. We describe how to use this information to speed up two previously published search tree algorithms. Then, we describe a novel iterative search strategy that is effecient in practice, where some of our reduction techniques can also be applied. Finally, we present results of an evaluation study for two different data sets from a biological application. Conclusions: We find that the running time for computing the optimal center string is dominated by the subroutine calls for d = d(opt) - 1 and d = d(opt). Our data reduction is very effective for both, either rejecting unsolvable instances or solving trivial positions. We find that this speeds up computations considerably.
Assume that a tuple of binary strings (a) over bar = has negligible mutual information with another string b. Does this mean that properties of the Kolmogorov complexity of (a) over bar do not change significantly if...
详细信息
Assume that a tuple of binary strings (a) over bar = < a(1), ..., a(n)> has negligible mutual information with another string b. Does this mean that properties of the Kolmogorov complexity of (a) over bar do not change significantly if we relativize them to b? This question becomes very nontrivial when we try to formalize it. In this paper we investigate this problem for a special class of properties (for properties that can be expressed by an there exists-formula). In particular, we show that a random (conditional on (a) over bar) oracle b does not help to extract common information from the strings a(i).
At the current state of the art, genetic programs do not contain two constructs that commonly occur in programs written by humans, that is, loops and functions with parameters. In this paper we describe an investigati...
详细信息
ISBN:
(纸本)9781424481262
At the current state of the art, genetic programs do not contain two constructs that commonly occur in programs written by humans, that is, loops and functions with parameters. In this paper we describe an investigation into the evolution of programs for a problem that can only be solved by evolving a parameterised program with one or more loops. We provide training examples of the desired program behaviour for a number of problem sizes and require the evolution of a program P(n) that will give the correct output for any value of n. We have chosen a problem, that of reproducing a binary string to a given number of bits, that can be made harder or easier by adjusting various aspects of the formulation. We are interested seeing in which formulations lead to success and which do not. We conclude that programs with parameters and loops can be successfully evolved if the search space is appropriately restricted by (1) grammars which restrict the possible program structures, (2) limits on program depth and (3) limits on the range of random constants.
Quantizing real-valued templates into binary strings is a fundamental step in biometric compression and template protection. In this paper, we introduce the area under the FRR curve optimize bit allocation (AUF-OBA) p...
详细信息
ISBN:
(纸本)9781617388767
Quantizing real-valued templates into binary strings is a fundamental step in biometric compression and template protection. In this paper, we introduce the area under the FRR curve optimize bit allocation (AUF-OBA) principle. Given the bit error probability, AUF-OBA assigns the numbers of quantization bits to every feature, in such way that the analytical area under the false rejection rate (FRR) curve for a Hamming distance classifier (HDC) is minimized. Experiments on the FRGC face database yield good performances.
In this paper, we discuss the realization of the genetic algorithm on calculating the Hausdorff measure of the Sierpinski gasket with compression ratio 1/2 in detail, mainly including the encoding and decoding method,...
详细信息
ISBN:
(纸本)9783540748267
In this paper, we discuss the realization of the genetic algorithm on calculating the Hausdorff measure of the Sierpinski gasket with compression ratio 1/2 in detail, mainly including the encoding and decoding method, generation of the initial population, and fitness computation. The experimental results prove that the genetic algorithm is an effective and universal method to calculate Hausdorff measure.
The string Barcoding (SBC) problem, introduced by Rash and Gusfield (RECOMB, 2002), consists in finding a minimum set of substrings that can be used to distinguish between all members of a set of given strings. In a c...
详细信息
The string Barcoding (SBC) problem, introduced by Rash and Gusfield (RECOMB, 2002), consists in finding a minimum set of substrings that can be used to distinguish between all members of a set of given strings. In a computational biology context, the given strings represent a set of known viruses, while the substrings can be used as probes for an hybridization experiment via microarray. Eventually, one aims at the classification of new strings ( unknown viruses) through the result of the hybridization experiment. In this paper we show that SBC is as hard to approximate as Set Cover. Furthermore, we show that the constrained version of SBC ( with probes of bounded length) is also hard to approximate. These negative results are tight.
This paper presents a novel method for the automated type synthesis of planar mechanisms and multibody systems. The method explicitly includes topology as a design variable in an optimization framework based on a gene...
详细信息
This paper presents a novel method for the automated type synthesis of planar mechanisms and multibody systems. The method explicitly includes topology as a design variable in an optimization framework based on a genetic algorithm (GA). Each binary string genome of the GA represents the concatenation of the upper-right triangular portion of the link adjacency matrix of a mechanism. Different topologies can be explored by the GA by applying genetic operators to the genomes. The evolutionary process is not dependent on the results obtained from enumeration. Two examples of topology-based optimization show the applicability of this method to mechanism type synthesis problems. This method is distinct from others in the literature in that it represents the first fully automated algorithm for solving a general type synthesis problem with the help of a numeric optimizer.
In order to enhance efficiency of genetic algorithms, it is important to identify a linkage set, i.e. a set of loci tightly linked to construct a building block. In this paper, we propose a novel linkage identificatio...
详细信息
ISBN:
(纸本)0780393635
In order to enhance efficiency of genetic algorithms, it is important to identify a linkage set, i.e. a set of loci tightly linked to construct a building block. In this paper, we propose a novel linkage identification method for real-valued strings called the real-valued Dependency Detection for Distribution Derived from df (rD(5)). It can detect linkage sets with quasi-linear fitness evaluations. The rD(5) is designed based on the D-5 which has been proposed for binary strings. It detects dependencies of loci by estimating the distribution of strings classified according to fitness differences. The rD(5) and the LINC-R which is one of linkage identification methods proposed elsewhere, provide approximate equivalent information about a function to be solved, however, the rD(5) performs smaller number of fitness evaluations than the LINC-R for larger functions. Although estimation of distribution algorithms (EDAs) also estimate distribution of strings, it is difficult for EDAs to solve a function composed of exponentially scaled sub-functions. The proposed method, by contrast, can be applied to the function in the similar way to as to a function composed of uniformly scaled sub-functions which is easy for EDAs. We perform experiments to compare the proposed method with the LINC-R and to examine the scaling effect stability of the rD(5). We also investigate two parameters, that define the amount of perturbation (mutation) and that define the quantization level.
We discuss the evolution of cooperative behavior in the iterated prisoner's dilemma (IPD) game under random pairing in game playing. The main characteristic feature of this paper is the use of the random pairing s...
详细信息
ISBN:
(纸本)0780393635
We discuss the evolution of cooperative behavior in the iterated prisoner's dilemma (IPD) game under random pairing in game playing. The main characteristic feature of this paper is the use of the random pairing scheme in which each player plays against a different randomly chosen opponent at every round of the dilemma game. Each player has a single-round memory strategy represented by a binary string of length five. The next action of a player is determined by its strategy based on the result of its previous round of the dilemma game. First we perform computational experiments to examine the evolution of cooperative behavior under the random pairing scheme using various parameter specifications. Experimental results show that the evolution of cooperative behavior is difficult independent of parameter specifications about the population size, the crossover probability, and the mutation probability. It is also shown that slightly better results (i.e., higher payoff) are obtained from smaller populations. Then we demonstrate the possibility of the evolution of cooperative behavior under the random matchmaking scheme in the case of a spatial IPD model where each player is located in a cell in a two-dimensional grid-world.
Learning Classifier Systems (LCS) traditionally use a binary string rule representation with wildcards added to allow for generalizations over the problem encoding. We have presented a neural network-based representat...
详细信息
ISBN:
(纸本)0780393635
Learning Classifier Systems (LCS) traditionally use a binary string rule representation with wildcards added to allow for generalizations over the problem encoding. We have presented a neural network-based representation to aid their use in complex problem domains. Here each rule's condition and action are represented by a small neural network, evolved through the actions of the genetic algorithm. In this paper we present results from the use of backpropagation to provide local search in conjunction with the global search of the genetic algorithm within XCS creating a Memetic neural LCS. Significant decreases in the time taken to reach optimal behaviour are obtained from the incorporation of this local learning algorithm.
暂无评论