Data mining provides the opportunity to extract useful information from large databases. Various techniques have been proposed in this context in order to extract this information in the most efficient way. However, e...
详细信息
ISBN:
(纸本)1581139683
Data mining provides the opportunity to extract useful information from large databases. Various techniques have been proposed in this context in order to extract this information in the most efficient way. However, efficiency is not our only concern in this study. The security and privacy issues over the extracted knowledge must be seriously considered as well. By taking this into consideration, we study the procedure of hiding sensitive association rules in binary data sets by blocking some data values and we present an algorithm for solving this problem. We also provide a fuzzification of the support and the confidence of an association rule in order to accommodate for the existence of blocked/unknown values. In addition, we quantitatively compare the proposed algorithm with other already published algorithms by running experiments on binary data sets, and we also qualitatively compare the efficiency of the proposed algorithm in hiding association rules. We utilize the notion of border rules, by putting weights in each rule, and we use effective data structures for the representation of the rules so as (a) to minimize the side effects created by the hiding process and (b) to speed up the selection of the victim transactions. Finally, we discuss the advantages and the limitations of blocking. Copyright 2004 ACM.
In this paper we challenge the question of whether there is value in having multiple layers of semantic information associated with corpus semantic annotation. In this context we introduce a semantic annotation experi...
详细信息
ISBN:
(纸本)2951740816
In this paper we challenge the question of whether there is value in having multiple layers of semantic information associated with corpus semantic annotation. In this context we introduce a semantic annotation experiment in which novice annotators were asked to assign sense tags to a set of polysemous corpus nouns, using Wordnet as their referential sense repository. Wordnet is a rich sense inventory that provides explicit information of the semantic types associated with every word sense. To measure the effect semantic types' knowledge has on the sense assignment process, we carried out two annotation sessions. In the first session, annotators relied exclusively on Wordnet synsets to annotate corpus nouns, whereas in the second session the same pool of annotators examined Wordnet synsets in conjunction with their semantic types, prior assigning a sense tag. Comparing annotators' performance in both sessions shows that when consulting semantic types, annotators assigned more salient senses to highly polysemous nouns, whereas for the same set of terms, when relying exclusively on Wordnet synsets, annotators tended to assign narrower senses, which whatsoever were more error-prone. Results indicate that semantic types have a potential in dealing with subtle sense distinctions in the course of corpus annotation.
As the size of the Web grows, it becomes an imperative to equip search engines with sophisticated indexing modules in order to enable a meaningful organization of the stored data. In this paper we present a structured...
详细信息
ISBN:
(纸本)2951740816
As the size of the Web grows, it becomes an imperative to equip search engines with sophisticated indexing modules in order to enable a meaningful organization of the stored data. In this paper we present a structured multilingual conceptual repository that has been employed as the backbone of a conceptual indexing and retrieval system. Our conceptual warehouse originates from a multilingual semantic network (Balkanet) and its Inter-Lingual-Index, which was enriched with domain ontology information inherited from the SUMO ontology. We report on the ontology's design principles and provide a description of its structure. We argue that an important attribute of the Balkanet's ILI is its flexibility in incorporating new concepts and/or languages by allowing the percolation of shared semantic attributes to all concepts represented within taxonomies. We further present our approach to conceptual indexing, and introduce an indexing algorithm that utilizes Balkanet's classified conceptual taxonomies. Finally, we discuss how conceptual taxonomies can help retrieval algorithms in making links between terms used in search requests and semantically related terms that might be found in the indexed documents.
In this paper, we present the design, implementation and evaluation of FESMI, a fuzzy expert system that deals with diagnosis and treatment of male impotence. The diagnosis process, linguistic variables and their valu...
详细信息
Fairly rapid environmental changes call for continuous surveillance and on-line decision making. There are two main areas where IT technologies can be valuable. In this paper we present a multi-agent system for monito...
详细信息
We present a virtual laboratory, which is designed and implemented in the framework of the VirRAD European project. This laboratory represents a 3D simulation of a radio-pharmacy laboratory, where learners, represente...
详细信息
We present a virtual laboratory, which is designed and implemented in the framework of the VirRAD European project. This laboratory represents a 3D simulation of a radio-pharmacy laboratory, where learners, represented by 3D avatars, can experiment on radio-pharmacy equipment by carrying out specific learning scenarios. We describe the functionality provided by this laboratory, the motivation factors which led to its formation, the technological decisions that were made for the optimization of the system as well as the envisioned steps to be carried out.
Understanding and modeling user online behavior, as well as predicting future requests remain an open challenge for researchers, analysts and marketers. In this paper, we propose an efficient prediction schema based o...
详细信息
ISBN:
(纸本)9780769521008
Understanding and modeling user online behavior, as well as predicting future requests remain an open challenge for researchers, analysts and marketers. In this paper, we propose an efficient prediction schema based on the extraction of sequential navigation patterns from server log files, combined with web site topology. Traversed paths are monitored, internally recorded and cleaned before being completed with cashed page views. After session and episode identification follows the construction of n-grams. Prediction is based upon a 5 + n-gram schema with all lower level n-grams participating, a procedure that resembles the construction of an All 5th-order Markov Model. The schema achieves full coverage while maintaining competitive prediction precision.
Web portals offer many services and wealth of content to Web users. However, most users do not find interest in all the content present in these sites. Most of them visit some specific sites and browse in specific the...
详细信息
Web portals offer many services and wealth of content to Web users. However, most users do not find interest in all the content present in these sites. Most of them visit some specific sites and browse in specific thematic areas of them. In this paper, a software technique is presented that allows the viewers of Web sites to build their own personalized portals, using specific thematic areas of their preferred sites. This transcoding technique is based on an algorithm, which fragments a Web page in discrete fragments using the page's internal structure. A training and update procedure is used for identifying the different instances of thematic areas in different time points.
In this paper we introduce the Weighted Suffix Tree, an efficient data structure for computing string regularities in weighted sequences of molecular data. Molecular Weighted Sequences can model important biological p...
详细信息
Power dissipation during scan-based testing has gained significant importance in the past few years. In this work we examine the use of transition frequency based on scan cell ordering techniques in pseudorandom scan ...
详细信息
Power dissipation during scan-based testing has gained significant importance in the past few years. In this work we examine the use of transition frequency based on scan cell ordering techniques in pseudorandom scan based BIST in order to reduce average power dissipation. We also propose the resetting of the input register of the circuit together with ordering of its elements to further reduce average power dissipation. Experimental results indicate that the proposed techniques can reduce average power dissipation up to 57.7%.
暂无评论