This paper considers various numerical functions that determine the degree of similarity between two finite sequences. These similarity measures are based on the concept of embedding for sequences, which we define her...
详细信息
This paper considers various numerical functions that determine the degree of similarity between two finite sequences. These similarity measures are based on the concept of embedding for sequences, which we define here. A special case of this embedding is a subsequence. Other cases additionally require equal distances between adjacent symbols of a subsequence in both sequences. This is a generalization of the concept of the substring with unit distances. Moreover, equality of distances from the beginning of the sequences to the first embedded symbol or from the last embedded symbol to the end of the sequences may be required. In addition to the last two cases, an embedding can occur in the sequence more than once. In the literature, functions such as the number of common embeddings or the number of pairs of occurrences of embeddings in a sequence are used. We introduce three additional functions: the sum of lengths of common embeddings, the sum of the minimum numbers of occurrences of a common embedding in both sequences, and the similarity function based on the longest common embedding. In total, we consider 20 numerical functions;for 17 of these functions, algorithms (including new ones) of polynomial complexity are proposed;for two functions, algorithms of exponential complexity with a reduced exponent are proposed. In Conclusions, we briefly compare these embeddings and functions.
暂无评论