The leakage of sensitive information is a pressing problem when information is processed digitally due to the economic, political and social repercussions that it can cause to its owner. Despite the risks and possible...
详细信息
ISBN:
(纸本)9783030898205;9783030898199
The leakage of sensitive information is a pressing problem when information is processed digitally due to the economic, political and social repercussions that it can cause to its owner. Despite the risks and possible threats, the information must always be kept available to users, therefore, alternatives must be available to protect, detect, and prevent the leakage of sensitive information. A particular case of this problem is the leakage of sensitive textual documents. However, the identification of unstructured sensitive information is a problem whose solution is not totally satisfactory despite the development of methods and applications with promising results. Thus, it is necessary to continue developing methods that contribute to the effective solution of the problem based on a critical analysis of existing techniques and their future projections. In this work we start from a taxonomy of the approaches with which this problem has been approached. From the taxonomy, the critical analysis of the techniques and above all considering the practical needs, a method of solution to the problem of determining the sensitivity of textual documents is proposed from the perspective of logicalcombinatorialpatterns recognition. The problem is approached as a supervised classification problem with two classes: sensitive and non-sensitive textual documents. The proposal in this work is the STClass method to determine the sensitivity of documents, which consists of two phases: the training phase, where the parameters for classification are defined and the classification phase. With the datasets used, 96% of the well classified documents were reached.
Typical testors are useful tools for feature selection and for determining feature relevance in supervised classification problems. Nowadays, computing all typical testors of a training matrix is very expensive;all re...
详细信息
Typical testors are useful tools for feature selection and for determining feature relevance in supervised classification problems. Nowadays, computing all typical testors of a training matrix is very expensive;all reported algorithms have exponential complexity depending on the number of columns in the matrix. In this paper, we introduce the faster algorithm BR (Boolean Recursive), called fast-BR algorithm, that is based on elimination of gaps and reduction of columns. Fast-BR algorithm is designed to generate all typical testors from a training matrix, requiring a reduced number of operations. Experimental results using this fast implementation and the comparison with other state-of-the-art related algorithms that generate typical testors are presented.
In this paper, we introduce a fast implementation of the CT_EXT algorithm for testor property identification, that is based on an accumulative binary tuple. The fast implementation of the CT_EXT algorithm ( one of the...
详细信息
In this paper, we introduce a fast implementation of the CT_EXT algorithm for testor property identification, that is based on an accumulative binary tuple. The fast implementation of the CT_EXT algorithm ( one of the fastest algorithms reported), is designed to generate all the typical testors from a training matrix, requiring a reduced number of operations. Experimental results using this fast implementation and the comparison with other state-of-the-art algorithms that generate typical testors are presented.
Shape-of-object representation has always been an important topic in image processing and patternrecognition. This work deals with representation of shape of objects, and approaches to recognize objects. Several inva...
详细信息
ISBN:
(纸本)0769525695
Shape-of-object representation has always been an important topic in image processing and patternrecognition. This work deals with representation of shape of objects, and approaches to recognize objects. Several invariant techniques are widely used to represent an object because they preserve information and allow considerable data reduction. In this paper, a new approach based on a code representation and testor theory is presented. The proposed method is invariant under translation, scaling and rotation. Also, the paper discusses the capabilities of the method in recognizing objects. In addition, results using simple figures classes are show.
The so-called logicalcombinatorial approach to patternrecognition is presented, and works (mainly in Spanish and Russian) that are not ordinarily available, are exposed to the Western reader. The use of this approac...
详细信息
The so-called logicalcombinatorial approach to patternrecognition is presented, and works (mainly in Spanish and Russian) that are not ordinarily available, are exposed to the Western reader. The use of this approach for supervised and unsupervised patternrecognition, and for feature selection is reviewed. Also, an unified notation describing the original contributions is presented, thus rendering this important area more readable. Our review is not exhaustive;nevertheless, most significant works are enclosed. Our hope is to motivate the reader to inquire further in these works. This paper serves as an introduction to three articles on the logicalcombinatorial approach that appear in this issue of patternrecognition. (C) 2001 patternrecognition Society. Published by Elsevier Science Ltd. All rights reserved.
In this paper, a new conceptual algorithm for the conceptual analysis of mixed incomplete data sets is introduced. This is a logical combinatorial pattern recognition (LCPR) based tool for the conceptual structuraliza...
详细信息
In this paper, a new conceptual algorithm for the conceptual analysis of mixed incomplete data sets is introduced. This is a logical combinatorial pattern recognition (LCPR) based tool for the conceptual structuralization of spaces. Starting from the limitations of the elaborated conceptual algorithms, our laboratories are working in the application of the methods, the techniques, and in general, the philosophy of the logical combinatorial pattern recognition with the task to improve those limitations. An extension of Michalski's concept of l-complex for any similarity measure, a generalization operator for symbolic variables, and an extension of Michalski's refunion operator are introduced. Finally, the performance of the RGC algorithm is analyzed. A comparison with several known conceptual algorithms is presented. (C) 2002 Elsevier Science Ltd. All rights reserved.
In this paper the GK* model for the construction of thesaurus classes based on fuzzy semantic association measure between index terms and concepts (thesaurus classes) is presented. The association measure is obtained ...
详细信息
In this paper the GK* model for the construction of thesaurus classes based on fuzzy semantic association measure between index terms and concepts (thesaurus classes) is presented. The association measure is obtained on the basis of fuzzy semantic relations between index terms, and it is used to cluster index terms into concepts. A hierarchical algorithm is introduced which runs on a simple numerical example. (C) 2001 patternrecognition Society. Published by Elsevier Science Ltd. All rights reserved.
暂无评论