this paper addresses relation information extraction problem and proposes a method of discovering relations among entities which is buried in different nest structures of XML documents. the method first identifies and...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
this paper addresses relation information extraction problem and proposes a method of discovering relations among entities which is buried in different nest structures of XML documents. the method first identifies and collects XML fragments that contain all types of entities given by users, then computes similarity between fragments based on semantics of their tags and their structures, and clusters fragments by similarity so that the fragments containing the same relation are clustered together, finally extracts relation instances and patterns of their occurrences from each cluster. the results of experiments show that the method can identify and extract relation information among given types of entities correctly from all kinds of XML documents with meaningful tags.
In order to obtain more significant information, the increasing hyper-dimensional data is acquired from multi-channel sensors, but the amount of data becomes very large. the quality of the data must be reduced for the...
详细信息
ISBN:
(纸本)9781424410651
In order to obtain more significant information, the increasing hyper-dimensional data is acquired from multi-channel sensors, but the amount of data becomes very large. the quality of the data must be reduced for the data processing and transmission. the central problem is how, to extract the significant features from these data for these purposes. Optimal discrimination plane (ODP) technique based on Fisher's criterion method was developed to reduce the data in the paper. the patterns were projected onto the two orthogonal vectors that built up the ODP, and two-dimensional feature vectors were attained and utilized as features to represent the patterns. Electrocardiogram signals were applied to the analysis as an example in this study A quadratic discriminant function (QDF) based classifier and a threshold vector based classifier were employed to measure the performance of the extracted feature, respectively. the results show the proposed ODP is an effective and feasible technique to extract the features from the hyper-dimensional time series data.
Association rule mining often results in an overwhelming number of rules. In practice, it is difficult for the final user to select the most relevant rules. In order to tackle this problem, various interestingness mea...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
Association rule mining often results in an overwhelming number of rules. In practice, it is difficult for the final user to select the most relevant rules. In order to tackle this problem, various interestingness measures were proposed. Nevertheless, the choice of an appropriate measure remains a hard task and the use of several measures may lead to conflicting information. In this paper, we give a unified view of objective interestingness measures. We define a new framework embedding a large set of measures called SBMs and we prove that the SBMs have a similar behavior. Furthermore, we identify the whole collection of the rules simultaneously optimizing all the SBMs. We provide an algorithm to efficiently mine a reduced set of rules among the rules optimizing all the SBMs. Experiments on real datasets highlight the characteristics of such rules.
Programs for gene prediction in computational biology are examples of systems for which the acquisition of authentic test data is difficult as these require years of extensive research. this has lead to test methods b...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
Programs for gene prediction in computational biology are examples of systems for which the acquisition of authentic test data is difficult as these require years of extensive research. this has lead to test methods based on semiartificially produced test data, often produced by ad hoc techniques complemented by statistical models such as Hidden Markov Models (HMM). the quality of such a test method depends on how well the test data reflect the regularities in known data and how well they generalize these regularities. So far only very simplified and generalized, artificial data sets have been tested, and a more thorough statistical foundation is required. We propose to use logic-statistical modelling methods for machine-learning for analyzing existing and manually marked up data, integrated withthe generation of new, artificial data. More specifically, we suggest to use the PRISM system developed by Sato and Kameya. Based on logic programming extended with random variables and parameter learning, PRISM appears as a powerful modelling environment, which subsumes HMMs and a wide range of other methods, all embedded in a declarative language. We illustrated these principles here, showing parts of a model under development for genetic sequences and indicate first initial experiments producing test data for evaluation of existing gene finders, exemplified by GENSCAN, HMMGene and ***.
Mining maximal frequent itemsets in data streams is more difficult than mining them in static databases for the huge, high-speed and continuous characteristics of data streams. In this paper, we propose a novel one-pa...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
Mining maximal frequent itemsets in data streams is more difficult than mining them in static databases for the huge, high-speed and continuous characteristics of data streams. In this paper, we propose a novel one-pass algorithm called FpMFI-DS, which mines all maximal frequent itemsets in Landmark windows or Sliding windows in data streams based on FP-Tree. A new structure of FP-Tree is designed for storing all transactions in Landmark windows or Sliding windows in data streams. To improve the efficiency of the algorithm, a new pruning technique, extension support equivalency pruning (ESEquivPS), is imported to it. the experiments show that our algorithm is efficient and scalable. It is suitable for mining MFIs both in static database and in data streams.
Combinatorial Transcriptional Fluorescent In Situ Hybridization (CT-FISH) is a confocal fluorescence imaging technique enabling the detection of multiple active transcription units in individual interphase diploid nuc...
详细信息
ISBN:
(纸本)9781424406715
Combinatorial Transcriptional Fluorescent In Situ Hybridization (CT-FISH) is a confocal fluorescence imaging technique enabling the detection of multiple active transcription units in individual interphase diploid nuclei. As improved combinatorial labeling methods allow simultaneous measurement of gene activities to expand from five genes in a single embryo or tissue section to upward of twenty genes, transforming image stacks into usable data becomes an increasingly labor intensive task. In this paper we describe our progress towards a method for the computational analysis of confocal images from Drosophila melanogastar that involves the segmentation of the cell nuclei and of nascent transcription sites of specific genes. Using image processing and machinelearning algorithms, we allow experimentalists to reiteratively tune and improve the analysis system to reflect biological reality.
Based on the definitions of extensible set and the constructing method of its dependent function, a sort of classification method under extension tranformation, which is called extension classification method, is stud...
详细信息
ISBN:
(纸本)9781424410651
Based on the definitions of extensible set and the constructing method of its dependent function, a sort of classification method under extension tranformation, which is called extension classification method, is studied. It is different from the classification methods based on classical set, fuzzy set and rough set, and it is a sort of alterable classification method According to a certain transformation, it can divide a universe of discourse into 5 ports: positive extension field, negative extension field, positive stable field, negative stable field and extension boundary. Moreover, the universe of discourse and the dependent function describing the degree that an object possesses certain character are alterable. It makes the classification more elaborate. the phenomenon that "there is a corresponding classification pattern for a given transformation" is illuminated from the angle of set theory. Taking the extension classification management. on human resources as an example, its applied value will be explained. the classification method is a basic Method of extension data mining. It makes the classification function of data mining richer.
As a fundamental problem in patternrecognition, graph matching has found a variety of applications in the field of computer vision. In graph matching, patterns are modeled as graphs and patternrecognition amounts to...
详细信息
ISBN:
(纸本)9781424416301
As a fundamental problem in patternrecognition, graph matching has found a variety of applications in the field of computer vision. In graph matching, patterns are modeled as graphs and patternrecognition amounts to finding a correspondence between the nodes of different graphs. there are many ways in which the problem has been formulated, but most can be cast in general as a quadratic assignment problem, where a linear term in the objective function encodes node compatibility functions and a quadratic term encodes edge compatibility functions. the main research focus in this theme is about designing efficient algorithms for solving approximately the quadratic assignment problem, since it is NP-hard. In this paper, we turn our attention to the complementary problem: how to estimate compatibility functions such that the solution of the resulting graph matching problem best matches the expected solution that a human would manually provide. We present a method for learning graph matching: the training examples are pairs of graphs and the "labels" are matchings between pairs of graphs. We present experimental results with real image data which give evidence that learning can improve the performance of standard graph matching algorithms. In particular, it turns out that linear assignment with such a learning scheme may improve over state-of-the-art quadratic assignment relaxations. this finding suggests that for a range of problems where quadratic assignment was thought to be essential for securing good results, linear assignment, which is far more efficient, could be just sufficient if learning is performed. this enables speed-ups of graph matching by up to 4 orders of magnitude while retaining state-of-the-art accuracy.
In recent years there has been a tremendous increase in the number of users maintaining online blogs on the Internet. Companies, in particular, have become aware of this medium of communication and have taken a keen i...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
In recent years there has been a tremendous increase in the number of users maintaining online blogs on the Internet. Companies, in particular, have become aware of this medium of communication and have taken a keen interest in what is being said about them through such personal blogs. this has given rise to a new field of research directed towards mining useful information from a large amount of unformatted data present in online blogs and online forums. We discuss an implementation of such a blog mining application. the application is broadly divided into two parts, the indexing process and the search module. Blogs pertaining to different organizations are fetched from a particular blog domain on the Internet. After analyzing the textual content of these blogs they are assigned a sentiment rating. Specific data from such blogs along withtheir sentiment ratings are then indexed on the physical hard drive. the search module searches through these indexes at run time for the input organization name and produces a list of blogs conveying both positive and negative sentiments about the organization.
Approximate inferring approach of Credal network based on Ant Colony Algorithms is put forward Considering g network inferring of goal-oriented in Bayesian network for the variable decision-maker is interested in live...
详细信息
ISBN:
(纸本)9781424410651
Approximate inferring approach of Credal network based on Ant Colony Algorithms is put forward Considering g network inferring of goal-oriented in Bayesian network for the variable decision-maker is interested in liven some evidence, the paper gives arithmetic for acquiring equivalent Credal network structure of goal-oriented Selecting of these vertexes in Credal network is considered as a multistage decision-making. Based on this, Ant Colony Algorithms is applied for Credal network approximate inferring to reuse the vertexes of high probability of each variable in order to improve of efficiency of inferring arithmetic and avoid some unnecessary computation. Finally, it shows the validity of the approach by simple analysis for a complex Credal network model.
暂无评论