Current tree-to-tree models suffer from parsing errors as they usually use only 1-best parses for rule extraction and decoding. We instead propose a forest-based tree-to-tree model that uses packed forests. The model ...
详细信息
Current SMT systems usually decode with single translation models and cannot benefit from the strengths of other models in decoding phase. We instead propose joint decoding, a method that combines multiple translation...
详细信息
Hyper Surface Classification (HSC), which is based on Jordan Curve Theorem in Topology, has been proven to be a simple and effective method for classifying a large database in our previous work. In this paper, through...
详细信息
Hyper Surface Classification (HSC), which is based on Jordan Curve Theorem in Topology, has been proven to be a simple and effective method for classifying a large database in our previous work. In this paper, through theoretical analysis, we find that different scales may affect the training process of HSC, which influences its classification performance. To investigate the impact and find a suitable scale, the scale transformation of HSC is studied. The experimental results show that the accuracy increases with the shrinkage of the scale, but the effect is tiny. Furthermore, we find that some samples become inconsistent and repetitious when the scale is adequately small, because of the powerlessly providing enough precision by the data type of computer. Fortunately, HSC can get a high performance with common scales as experiments exhibit.
More analysis has been done to discover the meaningful unusual patterns which may mean fraud or anomaly. In this paper, a novel unsupervised approach for discovering meaningful unusual observations is proposed. We fir...
详细信息
More analysis has been done to discover the meaningful unusual patterns which may mean fraud or anomaly. In this paper, a novel unsupervised approach for discovering meaningful unusual observations is proposed. We firstly apply an unsupervised version of Hyper Surface Classification (HSC) algorithm to gain the separating hyper surface. It needs no domain knowledge but can not discover the local unusual pattern. To solve this problem, we additionally search the Minimum Spanning Tree (MST). Given the domain knowledge, a process of subdividing is proposed to detect unusual pattern in each Minimum Spanning Tree. Experimental results show that our approach can detect unusual patterns effectively, even some of which are overlooked by using the traditional clustering and outlier detection algorithms.
Autonomic unit is the essential element and the most basic component of autonomic systems. Having the ability of handling emotions is considered to make autonomic unit more intelligent, more communicative and more soc...
详细信息
In this paper, we propose a vehicle detection method based on AdaBoost. We focus on the detection of front-view car and bus with occlusions on highway. Samples with different occlusion situations are selected into the...
详细信息
Integration of bioinformatics data repositories is a challenging task in which data sets are usually heterogeneous in structure and are often distributed across multiple, autonomously maintained databases. In this con...
详细信息
Integration of bioinformatics data repositories is a challenging task in which data sets are usually heterogeneous in structure and are often distributed across multiple, autonomously maintained databases. In this context, we present an innovative system which coordinates bioinformatics data by combining P2P data integration paradigm and Distributed Dynamic Description Logics (D3L) on top of Multi- Agent System infrastructure. We define the semantics and syntax of D3L, and propose a distributed consistency checking algorithm for realizing the intelligent query with logical reasoning function and decomposing large tasks to sub-tasks that could be tackled by different agents. Finally, we introduce a prototype implementation and present its evaluation. The results indicate that the proposed approach achieves excellent robustness and satisfactory performance.
The storage of data is a key issue of information systems, which is an important foundation for data query and data mining. Relational database model has been proven to be a very useful data-storage technique. As info...
详细信息
The storage of data is a key issue of information systems, which is an important foundation for data query and data mining. Relational database model has been proven to be a very useful data-storage technique. As information is stored as data in relational databases, the induction of concepts from data is a pivotal topic in the data mining field. Formal Concept Analysis (FCA) turns out to be a perfect instrument for a meaningful and conceptual exploration of the stored data. In FCA, conceptual scaling provides a complete framework for transforming any many-valued context (i.e., relation/table) into a context (called a derived context), in which each manyvalued attribute is given a scale. The attributes in a scale basically describe meaningful features of the values of the initial attribute. From the logical point of view, complement operation plays a very important role in relational databases and data query systems. In this paper, we provide the connections between the concepts of binary relations and those of complementary binary relations, and propose an approach toward normalizing (complementary) scales, i.e., each (complementary) scale can be represented by a set of statements. One advantage of normalizing scales is to avoid generating huge derived relations, and hence this approach reduces storage cost. By the normalization, the concept lattice of the complement of a derived relation is reduced to a combination of the concept lattice of the derived relation and a set of statements.
Object-attribute-value-relationships, which are a frequently used data structure to code real-world problems, are formalized via many-valued contexts in Formal Concept Analysis (FCA). The aim of FCA is to explore (for...
详细信息
Object-attribute-value-relationships, which are a frequently used data structure to code real-world problems, are formalized via many-valued contexts in Formal Concept Analysis (FCA). The aim of FCA is to explore (formal) concepts from empirical data contexts. A concept of a context consists of two parts: the extent (objects the concept covers) and the intent (attributes describing the concept). From the logical point of view, the intent of each concept is a conjunction of some attributes. Similar to conjunction, negation and disjunction are also important logical operations of attributes or attributevalue pairs, which are common in human language. However, the classical FCA as a mathematical theory of concepts lacks a theory of negation and disjunction. In this paper, we take negation and disjunction into consideration in the process of constructing concepts, and hence obtain the following extended concepts: negative concepts of a context (i.e., a binary relation), negative concepts of a relation with some scales, V-concepts of a relation, V-concepts of a relation with some scales, logical concepts of a relation, and logical concepts of a relation with some scales. Compared with concepts in the classical FCA, the extended concepts mentioned above are more pertinent and more meaningful.
Conventional object recognition techniques rely heavily on manually annotated image datasets to achieve good performances. However, collecting high quality datasets is really laborious. In this paper, we propose a sem...
详细信息
ISBN:
(纸本)9781605586083
Conventional object recognition techniques rely heavily on manually annotated image datasets to achieve good performances. However, collecting high quality datasets is really laborious. In this paper, we propose a semi-supervised framework for learning visual categories from Google Images. The 1st and 2nd order features, which define bag of words representation and spatial relationship between local features respectively, make up an independent and redundant feature split. We then integrate a cotraining algorithm CoBoost with these two features. We create two boosting classifiers based on the 1st and 2nd order features respectively in the training, during which one classifier provides labels for the other. Besides, the 2nd order features are generated dynamically rather than extracted exhaustively to avoid high computation. An active learning technique is also introduced to further improve the performance. We evaluate our method on the benchmark datasets, showing results competitive with the state-of-the-art unsupervised approaches and some supervised techniques. Copyright 2009 ACM.
暂无评论