Social media provides an environment of information exchange. They principally rely on their users to create content, to annotate others' content and to make on-line relationships. The user activities reflect his ...
详细信息
Social media provides an environment of information exchange. They principally rely on their users to create content, to annotate others' content and to make on-line relationships. The user activities reflect his opinions, interests, etc. in this environment. We focus on analysing this social environment to detect user interests which are the key elements for improving adaptation. This choice is motivated by the lack of information in the user profile and the inefficiency of the information issued from methods that analyse the classic user behaviour (e.g. navigation, time spent on web page, etc.). So, having to cope with an incomplete user profile, the user social network can be an important data source to detect user interests. The originality of our approach is based on the proposal of a new technique of interests' detection by analysing the accuracy of the tagging behaviour of a user in order to figure out the tags which really reflect the content of the resources. So, these tags are somehow comprehensible and can avoid tags "ambiguity" usually associated to these social annotations. The approach combines the tag, user and resource in a way that guarantees a relevant interests detection. The proposed approach has been tested and evaluated in the Delicious social database. For the evaluation, we compare the result issued from our approach using the tagging behaviour of the neighbours (the egocentric network and the communities) with the information yet known for the user (his profile). A comparative evaluation with the classical tag-based method of interests detection shows that the proposed approach is better.
Semantic annotation on natural language texts labels the meaning of an annotated element in specific contexts, and thus is an essential procedure for domain knowledge acquisition. An extensible and coherent annotation...
详细信息
Semantic annotation on natural language texts labels the meaning of an annotated element in specific contexts, and thus is an essential procedure for domain knowledge acquisition. An extensible and coherent annotation method is crucial for knowledge engineers to reduce human efforts to keep annotations consistent. This article proposes a comprehensive semantic annotation approach supported by a user-oriented markup language named UOML to enhance annotation efficiency with the aim of building a high quality knowledge base. UOML is operable by human annotators and convertible to formal knowledge representation languages. A pattern-based annotation conversion method named PAC is further proposed for knowledge exchange by utilizing automatic pattern learning. We designed and implemented a semantic annotation platform Annotation Assistant to test the effectiveness of the approach. By applying this platform in a long-term international research project for more than three years aiming at high quality knowledge acquisition from a classical Chinese poetry corpus containing 52,621 Chinese characters, we effectively acquired 150,624 qualified annotations. Our test shows that the approach has improved operational efficiency by 56.8%, on average, compared with text-based manual annotation. By using UOML, PAC achieved a conversion error ratio of 0.2% on average, significantly improving the annotation consistency compared with baseline annotations. The results indicate the approach is feasible for practical use in knowledge acquisition and conversion.
Anonymization of graph-based data is a problem, which has been widely studied last years, and several anonymization methods have been developed. Information loss measures have been carried out to evaluate the noise in...
详细信息
Anonymization of graph-based data is a problem, which has been widely studied last years, and several anonymization methods have been developed. Information loss measures have been carried out to evaluate the noise introduced in the anonymized data. Generic information loss measures ignore the intended anonymized data use. When data has to be released to third-parties, and there is no control on what kind of analyses users could do, these measures are the standard ones. In this paper we study different generic information loss measures for graphs comparing such measures to the cluster-specific ones. We want to evaluate whether the generic information loss measures are indicative of the usefulness of the data for subsequent data mining processes.
Information graphics (infographics) in popular media are highly structured knowledge representations that are generally designed to convey an intended message. This paper presents a novel methodology for retrieving in...
详细信息
Information graphics (infographics) in popular media are highly structured knowledge representations that are generally designed to convey an intended message. This paper presents a novel methodology for retrieving infographics from a digital library that takes into account a graphic's structural and message content. The retrieval methodology can be summarized thus: 1) hypothesize requisite structural and message content from a natural language query, 2) measure the relevance of each candidate infographic to the requisite structural and message content hypothesized from the user query, and 3) integrate these relevance measurements via a linear combination model in order to produce a ranked list of infographics in response to the user query. The methodology has been implemented and evaluated, and it significantly outperforms a baseline method that treats queries and graphics as bags of words. (C) 2015 Published by Elsevier B.V.
Clustering xml documents by structure is the task of grouping them by common structural components. Hitherto, this has been accomplished by looking at the occurrence of one preestablished type of structural components...
详细信息
Clustering xml documents by structure is the task of grouping them by common structural components. Hitherto, this has been accomplished by looking at the occurrence of one preestablished type of structural components in the structures of the xml documents. However, the a-priori chosen structural components may not be the most appropriate for effective clustering. Moreover, it is likely that the resulting clusters exhibit a certain extent of inner structural inhomogeneity, because of uncaught differences in the structures of the xml documents, due to further neglected forms of structural components. To overcome these limitations, a new hierarchical approach is proposed, that allows to consider (if necessary) multiple forms of structural components to isolate structurally-homogeneous clusters of xml documents. At each level of the resulting hierarchy, clusters are divided by considering some type of structural components (unaddressed at the preceding levels), that still differentiate the structures of the xml documents. Each cluster in the hierarchy is summarized through a novel technique, that provides a clear and differentiated understanding of its structural properties. A comparative evaluation over both real and synthetic xmldata proves that the devised approach outperforms established competitors in effectiveness and scalability. Cluster summarization is also shown to be very representative. (C) 2012 Elsevier B.V. All rights reserved.
xml has gained widespread acceptance as a premier format for publishing, sharing and manipulating data through the web. While the semi-structured nature of xml provides a high degree of syntactic flexibility there are...
详细信息
xml has gained widespread acceptance as a premier format for publishing, sharing and manipulating data through the web. While the semi-structured nature of xml provides a high degree of syntactic flexibility there are significant shortcomings when it comes to specifying the semantics of xmldata. For the advancement of xml applications it is therefore a major challenge to discover natural classes of constraints that can be utilized effectively by xmldata engineers. This endeavor is ambitious given the multitude of intractability results that have been established. We investigate a class of xml cardinality constraints that is precious in the sense that it keeps the right balance between expressiveness and efficiency of maintenance. In particular, we characterize the associated implication problem axiomatically and develop a low-degree polynomial time algorithm that can be readily applied for deciding implication. Our class of constraints is chosen near-optimal as already minor extensions of its expressiveness cause potential intractability. Finally, we transfer our findings to establish a precious class of soft cardinality constraints on xmldata. Soft cardinality constraints need to be satisfied on average only, and thus permit violations in a controlled manner. Soft constraints are therefore able to tolerate exceptions that frequently occur in practice, yet can be reasoned about efficiently. (C) 2012 Elsevier B.V. All rights reserved.
xml has gained widespread acceptance as a premier format for publishing, sharing and manipulating data through the web. While the semi-structured nature of xml provides a high degree of syntactic flexibility there are...
详细信息
xml has gained widespread acceptance as a premier format for publishing, sharing and manipulating data through the web. While the semi-structured nature of xml provides a high degree of syntactic flexibility there are significant shortcomings when it comes to specifying the semantics of xmldata. For the advancement of xml applications it is therefore a major challenge to discover natural classes of constraints that can be utilized effectively by xmldata engineers. This endeavor is ambitious given the multitude of intractability results that have been established. We investigate a class of xml cardinality constraints that is precious in the sense that it keeps the right balance between expressiveness and efficiency of maintenance. In particular, we characterize the associated implication problem axiomatically and develop a low-degree polynomial time algorithm that can be readily applied for deciding implication. Our class of constraints is chosen near-optimal as already minor extensions of its expressiveness cause potential intractability. Finally, we transfer our findings to establish a precious class of soft cardinality constraints on xmldata. Soft cardinality constraints need to be satisfied on average only, and thus permit violations in a controlled manner. Soft constraints are therefore able to tolerate exceptions that frequently occur in practice, yet can be reasoned about efficiently. (C) 2012 Elsevier B.V. All rights reserved.
暂无评论