An important problem in the area of homeland security is to identify abnormal or suspicious entities in large data sets. Although there are methods from data mining and social network analysis focusing on finding patt...
详细信息
An important problem in the area of homeland security is to identify abnormal or suspicious entities in large data sets. Although there are methods from data mining and social network analysis focusing on finding patterns or central nodes from networks or numerical data sets, there has been little work aimed at discovering abnormal instances in large complex semantic graphs, whose nodes are richly connected with many different types of links. In this paper, we describe a novel unsupervised framework to identify such instances. Besides discovering abnormal instances, we believe that to complete the process, a system has to also provide users with understandable explanations for its findings. Therefore, in the second part of the paper, we describe an explanation mechanism to automatically generate human-understandable explanations for the discovered results. To evaluate our discovery and explanation systems, we perform experiments on several different semantic graphs. The results show that our discovery system outperforms state-of-the-art unsupervised network algorithms used to analyze the 9/11 terrorist network and other graph-based outlier detection algorithms by a significant margin. Additionally, the human study we conducted demonstrates that our explanation system, which provides natural language explanations for the system's findings, allowed human subjects to perform complex data analysis in a much more efficient and accurate manner.
Information in digital libraries and information systems frequently refers to locations or objects in geographic space. Digital gazetteers are commonly employed to match the referred placenames with actual locations i...
详细信息
Information in digital libraries and information systems frequently refers to locations or objects in geographic space. Digital gazetteers are commonly employed to match the referred placenames with actual locations in information integration and data cleaning procedures. This process may fail due to missing information in the gazetteer, multiple matches, or false positive matches. We have analyzed the cases of success and reasons for failure of the mapping process to a gazetteer. Based on these, we present a statistical model that permits estimating 1) the completeness of a gazetteer with respect to the specific target area and application, 2) the expected precision and recall of one-to-one mappings of source placenames to the gazetteer, 3) the semantic inconsistency that remains in one-to-one mappings, and 4) the degree to which the precision and recall are improved under knowledge of the identity of higher levels in a hierarchy of places. The presented model is based on statistical analysis of the mapping process of a large set of placenames itself and does not require any other background data. The statistical model assumes that a gazetteer is populated by a stochastic process. The paper discusses how future work could take deviations from this assumption into account. The method has been applied to a real case.
A new kind of metadata offers a synthesized view of an attribute's values for a user to exploit when creating or refining a search query in data-integration systems. The extraction technique that obtains these val...
详细信息
A new kind of metadata offers a synthesized view of an attribute's values for a user to exploit when creating or refining a search query in data-integration systems. The extraction technique that obtains these values is automatic and independent of an attribute domain but parameterized with various metrics for similarity measures. The authors describe a fully implemented prototype and some experimental results to show the effectiveness of "relevant values" when searching a knowledge base.
The whole product life cycle consists of three phases: Beginning of Life (BOL), Middle of Life ( MOL), and End of Life (EOL). Although large amounts of product life-cycle data are generated over the whole product life...
详细信息
The whole product life cycle consists of three phases: Beginning of Life (BOL), Middle of Life ( MOL), and End of Life (EOL). Although large amounts of product life-cycle data are generated over the whole product life cycle, data flows are rather vague after BOL. Over the last decade, however, emerging Internet, wireless mobile telecommunications, and product identification technologies have created the potential of making the whole product life cycle visible. As a result, the scope of data to be managed has expanded over the whole product life cycle. Hence, it becomes important to describe product life-cycle metadata in a systematic manner. Although much attention has been paid to data modeling over several objects such as products and processes, modeling methodology for product life-cycle metadata is not well developed. To cope with this limitation, we develop a modeling method for product life-cycle metadata by using the resource description framework (RDF). We define an RDF data model and its schema for describing and managing product life-cycle metadata. In addition, we describe how the proposed RDF model can be usefully applied to track, trace, and infer product life-cycle data with an RDF query language.
This paper presents a new complicated-knowledge representation method for the self-reconfiguration of complex systems such as complex software systems, complex manufacturing systems, and knowledgeable manufacturing sy...
详细信息
This paper presents a new complicated-knowledge representation method for the self-reconfiguration of complex systems such as complex software systems, complex manufacturing systems, and knowledgeable manufacturing systems. Herein, new concepts of a knowledge mesh (KM) and an agent mesh (AM) are proposed along with a new KM-based approach to complicated-knowledge representation. KM is the representation of such complicated macroknowledge as an advanced manufacturing mode, focusing on knowledge about the structure, functions, and information flows of an advanced manufacturing system. The multiple set, KM, and the mapping relationships between both, are then formally defined. The union, intersection, and minus operations on the multiple sets are proposed, and their properties proved. Then, the perfectness of a KM, the redundancy set between the two KMs, and the multiple redundancy set on the redundancy set are defined. Three examples are provided to illustrate the concepts-of the KM, multiple set, multiple redundancy set, and logical operations. On the basis of the above, the KM-based inference engine is presented. In logical operations on KMs, each KM is taken as an operand. A new KM obtained by operations on KM multiple sets can be mapped into an AM for automatic reconfiguration of complex software systems. Finally, the combination of two real management modes is exemplified for the effective application of the new KM-based method to the self-reconfiguration of complex systems. It is worth mentioning that KM multiple sets can also be taken as a new formal representation of software systems if their corresponding AMs are the real software systems.
暂无评论