We consider an open issue in the design of pattern classifiers, i.e., choosing between the best classifier among a given ensemble, and combining all the available ones using a trainable fusion rule. While the latter c...
详细信息
ISBN:
(纸本)9783319108407;9783319108391
We consider an open issue in the design of pattern classifiers, i.e., choosing between the best classifier among a given ensemble, and combining all the available ones using a trainable fusion rule. While the latter choice can in principle outperform the former, their actual effectiveness is affected by small sample size problems. this raises the need of investigating under which conditions one choice is better than the other one. We provide a first contribution, by deriving an analytical expressions of the expected error probability of best classifier selection, and by comparing it withthe one of a well known linear fusion rule, implemented withthe Fisher linear discriminant.
In this paper, we propose a novel dictionary learning method for hyperspectral image classification. the proposed method, linear regression Fisher discrimination dictionary learning (LRFDDL), obtains a more discrimina...
详细信息
ISBN:
(纸本)9783319108407;9783319108391
In this paper, we propose a novel dictionary learning method for hyperspectral image classification. the proposed method, linear regression Fisher discrimination dictionary learning (LRFDDL), obtains a more discriminative dictionary and a classifier by incorporating linear regression term and the Fisher discrimination into the objective function during training. the linear regression term makes predicted and actual labels as close as possible;while the Fisher discrimination is imposed on the sparse codes so that they have small with-class scatters but large between-class scatters. Experiments show that LRFDDL significantly improves the performances of hyperspectral image classification.
Diabetic retinopathy is a chronic progressive eye disease associated to a group of eye problems as a complication of diabetes. this disease may cause severe vision loss or even blindness. Specialists analyze fundus im...
详细信息
ISBN:
(纸本)9783319108407;9783319108391
Diabetic retinopathy is a chronic progressive eye disease associated to a group of eye problems as a complication of diabetes. this disease may cause severe vision loss or even blindness. Specialists analyze fundus images in order to diagnostic it and to give specific treatments. Fundus images are photographs taken of the retina using a retinal camera, this is a noninvasive medical procedure that provides a way to analyze the retina in patients with diabetes. the correct classification of these images depends on the ability and experience of specialists, and also the quality of the images. In this paper we present a method for diabetic retinopathy detection. this method is divided into two stages: in the first one, we have used local binary patterns (LBP) to extract local features, while in the second stage, we have applied artificial neural networks, random forest and support vector machines for the detection task. Preliminary results show that random forest was the best classifier with 97.46% of accuracy, using a data set of 71 images.
Zero-Latency data Warehouse (ZLDW) cannot be developed and formed on the basis of a standard ETL process, where time frames are limiting access to current data and blocking the ability to take users needs into account...
详细信息
ISBN:
(纸本)9783319108407;9783319108391
Zero-Latency data Warehouse (ZLDW) cannot be developed and formed on the basis of a standard ETL process, where time frames are limiting access to current data and blocking the ability to take users needs into account. therefore, after profound analysis of this issue and ones related to workload balancing, an innovative system based on a Workload Balancing Unit (WBU) was created. In this paper we present innovative workload balancing algorithm - CTBE (Choose Transaction By Election), which allows to analyze all incoming transactions and create a schema of dependencies between them. Also, cache in the created WBU ensures ability to store information on incoming transactions and exchange messages with systems transmitting updates and users' queries. By this work we intend to present an innovative system designed to support Zero-Latency data Warehouse.
learning from imbalanced multilabel data is a challenging task that has attracted considerable attention lately. Some resampling algorithms used in traditional classification, such as random undersampling and random o...
详细信息
ISBN:
(纸本)9783319108407;9783319108391
learning from imbalanced multilabel data is a challenging task that has attracted considerable attention lately. Some resampling algorithms used in traditional classification, such as random undersampling and random oversampling, have been already adapted in order to work with multilabel datasets. In this paper MLeNN (MultiLabel edited Nearest Neighbor), a heuristic multilabel undersampling algorithm based on the well-known Wilson's Edited Nearest Neighbor Rule, is proposed. the samples to be removed are heuristically selected, instead of randomly picked. the ability of MLeNN to improve classification results is experimentally tested, and its performance against multilabel random undersampling is analyzed. As will be shown, MLeNN is a competitive multilabel undersampling alternative, able to enhance significantly classification results.
Interlinking different data sources has become a crucial task due to the explosion of diverse, heterogeneous information repositories in the so-called Web of data. In this paper an approach to extract relationships be...
详细信息
ISBN:
(纸本)9783319108407;9783319108391
Interlinking different data sources has become a crucial task due to the explosion of diverse, heterogeneous information repositories in the so-called Web of data. In this paper an approach to extract relationships between entities existing in huge Linked data sources is presented. Our approach hinges on the Map-Reduce processing framework and context-based ontology matching techniques so as to discover the maximum number of possible relationships between entities within different data sources in an computationally efficient fashion. To this end the processing flow is composed by three Map-Reduce jobs in charge for 1) the collection of linksets between datasets;2) context generation;and 3) construction of entity pairs and similarity computation. In order to assess the performance of the proposed scheme an exemplifying prototype is implemented between DBpedia and LinkedMDB datasets. the obtained results are promising and pave the way towards benchmarking the proposed interlinking procedure with other ontology matching systems.
Nowadays, large collections of digital images are being created. Many of these collections are the product of digitizing existing collections of analogue photographs, diagrams, drawings, paintings, and prints. Content...
详细信息
ISBN:
(纸本)9783319108407;9783319108391
Nowadays, large collections of digital images are being created. Many of these collections are the product of digitizing existing collections of analogue photographs, diagrams, drawings, paintings, and prints. Content-Based Image retrieval is a solution for information management. Image retrieval combining low level perception (color, texture and shape) and high level one is an emerging wide area of research scope. In this paper, we presented a new semantic approach based on extraction of shape refined with texture and color features extraction, using 2D Beta Wavelet Network (2D BWN) modeling. the shape descriptor is based on Best Detail Coefficients (BDC), the texture descriptor is based on Best Approximation Coefficients (BAC) and the one for color is calculated on the approximated image by applying the first two moments. Experimental results for Wang database showed the effectiveness of the proposed method.
Imbalance data constitutes a great difficulty for most algorithms learning classifiers. However, as recent works claim, class imbalance is not a problem in itself and performance degradation is also associated with ot...
详细信息
ISBN:
(纸本)9783319108407;9783319108391
Imbalance data constitutes a great difficulty for most algorithms learning classifiers. However, as recent works claim, class imbalance is not a problem in itself and performance degradation is also associated with other factors related to the distribution of the data as the presence of noisy and borderline examples in the areas surrounding class boundaries. this contribution proposes to extend SMOTE with a noise filter called Iterative-Partitioning Filter (IPF), which can overcome these problems. the properties of this proposal are discussed in a controlled experimental study against SMOTE and its most well-known generalizations. the results show that the new proposal performs better than exiting SMOTE generalizations for all these different scenarios.
the aim of this study is the prediction of death of polytraumatized patients based on epidemiological, clinical and health treatment variables by means of data-mining methods. the main problems to be addressed were hi...
详细信息
ISBN:
(纸本)9783319108407;9783319108391
the aim of this study is the prediction of death of polytraumatized patients based on epidemiological, clinical and health treatment variables by means of data-mining methods. the main problems to be addressed were high dimensionality and imbalanced data. Since the techniques usually used to deal withthese drawbacks, as feature selection methods and sampling strategies respectively, did not provided satisfactory results, the aim of the study was to find out the data mining algorithms showing the best behavior in this kind of scenarios. the study was carried out withdata from 497 patients diagnosed with severe trauma who were hospitalized in the Intensive Care Unit (ICU) of the University Hospital of Salamanca. the results of the study reveal the better behavior of multiclassifiers as compared with simple classifiers in contexts of high dimensionality and imbalanced datasets, without the need to resort to undersampling and oversampling strategies, which can lead to the loss of valuable data and overfitting problems respectively.
Finding all frequent itemsets (patterns) in a given database is a challenging process that in general consumes time and space. Time is measured in terms of the number of database scans required to produce all frequent...
详细信息
ISBN:
(纸本)9783319108407;9783319108391
Finding all frequent itemsets (patterns) in a given database is a challenging process that in general consumes time and space. Time is measured in terms of the number of database scans required to produce all frequent itemsets. Space is consumed by the number of potential frequent itemsets which will end up classified as not frequent. To overcome both limitations, namely space and time, we propose a novel approach for generating all possible frequent itemsets by introducing a new representation of items into groups of four items and within each group, items are assigned one of four prime numbers, namely 2, 3, 5, and 7. the reported results demonstrate the applicability and effectiveness of the proposed approach. Our approach satisfies scalability in terms of number of transactions and number of items.
暂无评论