As an important branch in the field of frequent pattern mining, approximate frequent pattern (AFP) mining attracts much attention recently. Various algorithms have been proposed to discover long true AFPs in presence ...
详细信息
ISBN:
(纸本)9781467383165
As an important branch in the field of frequent pattern mining, approximate frequent pattern (AFP) mining attracts much attention recently. Various algorithms have been proposed to discover long true AFPs in presence of random noise. This paper considers the key issues of AFP mining in noisy databases, and categorizes the previous approaches according to the ways they cope with missing items in the transactions. And then a study of different data models on AFP is presented, in which the merits and defects are analyzed. Finally, we draw a conclusion and propose some solutions to deal with the problems in the field of AFP mining.
Term weighting is a strategy that assigns weights to terms in order to improve the performance of text categorization. In this paper, we propose a new category-based term weighting scheme named the probability of rele...
详细信息
Graph pattern matching is a hot spot in the big data era, which is to find answer graphs matching a given query graph in a data graph of graph databases. “Matching” means two graphs satisfy some relation, such as is...
详细信息
Graph pattern matching is a hot spot in the big data era, which is to find answer graphs matching a given query graph in a data graph of graph databases. “Matching” means two graphs satisfy some relation, such as isomorphism, simulation, bisimulation, etc. Since there are seldom algorithms for the subgraph bisimulation, our work commits to solve the graph pattern matching problem involving bisimulation relations through the model checking technology. We characterize query graphs by modal formulas. By model checking the formulas in the data graphs, the answer graphs bisimilar to the query graphs can be discovered. We add * to basic modal logic language resulting in ML + * language, and add r* to form ML + * formulas. Then a theorem which states that ML + * formulas characterize finite directed graphs modulo bisimulation is put forward. Furthermore, we list steps to find answer graphs bisimilar to a query graph.
Frequent pattern mining is commonly utilized to generate combined-feature candidates, yet many are non-discriminative and thus might be useless for predictive models. In this paper, we propose to use feature combinati...
详细信息
Frequent pattern mining is commonly utilized to generate combined-feature candidates, yet many are non-discriminative and thus might be useless for predictive models. In this paper, we propose to use feature combinations derived from frequent patterns to obtain more accurate multiclass classification models. Specifically, we present a novel mathematics inference to show what are discriminative feature combinations. Hence, an efficient algorithm is proposed for mining and selecting discriminative patterns. Experimental results on twenty UCI datasets demonstrate that the proposed method can help to improve the classification performance remarkably, compared with other baseline methods. Moreover, an internal evaluation is employed to validate the strong discriminative power of our feature combinations.
Reconstruction Method of Network Forensics Scenario has grown into a mature and rich technology that provides advanced skills to get the chain of evidence. Using statistical methods to analyze intrusion logs in order ...
详细信息
Reconstruction Method of Network Forensics Scenario has grown into a mature and rich technology that provides advanced skills to get the chain of evidence. Using statistical methods to analyze intrusion logs in order to present evidentiary values in court are often refuted as baseless and inadmissible evidences which is not considering the input spent. These spendings is to generate the reports no matter they are well-grounded evidences or not. Thus, this paper presents the Scenario Reconstruction Method combines the Viterbi algorithm, the most likely sequence of Meta evidence which replaces the Meta evidence was acquired. With suspected evidence, thus obtaining the chain of evidence. However, the Viterbi algorithm parameters is derived from the Baum-Welch (B-W) algorithm, and the B-W algorithm is easy to fall into local optima solution. While an Adaptive Genetic Algorithm (AGA) is used to estimate parameters of the Hidden Markov model (HMM), where Chromosome coding method and genetic operation mode are designed. The experimental results show that, this method can accurately reproduce the crime scene of network intrusion, compared with the network forensic evidence fusion method which is based on the HMM. The method has been applied to forensics system, and has obtained good result.
An authentication scheme that uses rehashing and secret sharing methods to verify the reality of a color image’s content is proposed in this paper. First, the proposed scheme uses Shamir’s secret sharing method to s...
详细信息
There has been a growing interest in alignment-free methods for whole genome comparison and phylogenomic studies. In this study, we propose an alignment-free method for phylogenetic tree construction using whole-prote...
详细信息
There has been a growing interest in alignment-free methods for whole genome comparison and phylogenomic studies. In this study, we propose an alignment-free method for phylogenetic tree construction using whole-proteome sequences. Based on the inter-amino-acid distances, we first convert the whole-proteome sequences into inter-amino-acid distance vectors, which are called observed inter-amino-acid distance profiles. Then, we propose to use conditional geometric distribution profiles (the distributions of sequences where the amino acids are placed randomly and independently) as the reference distribution profiles. Last the relative deviation between the observed and reference distribution profiles is used to define a simple metric that reflects the phylogenetic relationships between whole-proteome sequences of different organisms. We name our method inter-amino-acid distances and conditional geometric distribution profiles (IAGDP). We evaluate our method on two data sets: the benchmark dataset including 29 genomes used in previous published papers, and another one including 67 mammal genomes. Our results demonstrate that the new method is useful and efficient.
Latent Dirichlet allocation(LDA) is a popular and unsupervised tool for reducing dimension, has been applied in text mining and information retrieval. Belief propagation is competitive in both speed and accuracy compa...
详细信息
Optic Disk (OD) detection plays an important role for fundus image analysis. In this paper, we propose an algorithm for detecting OD mainly based on a classifier model trained by structured learning. Then we use the m...
详细信息
ISBN:
(纸本)9781467396769
Optic Disk (OD) detection plays an important role for fundus image analysis. In this paper, we propose an algorithm for detecting OD mainly based on a classifier model trained by structured learning. Then we use the model to achieve the edge map of OD. Thresholding is performed on the edge map to obtain a binary image. Finally, circle Hough transform is carried out to approximate the boundary of OD by a circle. The proposed algorithm has been evaluated on the public database and obtained promising results. The results (an area overlap and Dices coefficients of 0.8636 and 0.9196, respectively, an accuracy of 0.9770, and a true positive and false positive fraction of 0.9212 and 0.0106) show that the proposed method is a robust tool for the segmentation of OD and is very competitive with the stage-of-the-art methods.
暂无评论