There are several current systems developed to identify common skin lesions such as eczema that utilize image processing and most of these apply feature extraction techniques and machinelearning algorithms. These sys...
详细信息
ISBN:
(纸本)9781467386753
There are several current systems developed to identify common skin lesions such as eczema that utilize image processing and most of these apply feature extraction techniques and machinelearning algorithms. These systems extract the features from pre-processed images and use them for identifying the skin lesions through machinelearning as the core. This paper presents the design and evaluation of a system that implements a multi-model, multi-level system using the Artificial Neural Network (ANN) architecture for eczema detection. In this work, multi-model system is defined as architecture with different models depending on the input characteristic. The outputs of these models are integrated by a decision layer, thus multi-level, which computes the probability of an eczema case. The resulting system has 68.37% average confidence level as opposed to the 63.01% of the single level, i.e. single model, system in the actual testing of eczema versus non-eczema cases. Furthermore, the multi-model, multi-level design produces more stable models in the training phase wherein overfitting was reduced.
Clustering dynamic data is a challenge in identifying and forming groups. This unsupervised learning usually leads to undirected knowledge discovery. The cluster detection algorithm searches for clusters of data which...
详细信息
ISBN:
(纸本)9781467386753
Clustering dynamic data is a challenge in identifying and forming groups. This unsupervised learning usually leads to undirected knowledge discovery. The cluster detection algorithm searches for clusters of data which are similar to one another by using similarity measures. Determining the suitable algorithm which can bring the optimized groups cluster could be an issue. Depending on the parameters and attributes of the data, the results yielded from using both K-Means and K-Medoids could be varied. This paper presents a comparative analysis of both algorithms in different data clusters to lay out the strengths and weaknesses of both. Thorough studies were conducted in determining the correlation of the data with the algorithms to find the relationship among them.
The proceedings contain 401 papers. The topics discussed include: how big data changes statistical machinelearning;concept hierarchies and human navigation;learning to accurately COUNT with query-driven predictive an...
ISBN:
(纸本)9781479999255
The proceedings contain 401 papers. The topics discussed include: how big data changes statistical machinelearning;concept hierarchies and human navigation;learning to accurately COUNT with query-driven predictive analytics;practical message-passing framework for large-scale combinatorial optimization;iteratively refining SVMs using priors;online and on-demand partitioning of streaming graphs;rewriting complex SPARQL analytical queries for efficient cloud-based processing;towards scalable quantile regression trees;user-curated image collections: modeling and recommendation;SyntacticDiff: operator-based transformation for comparative text mining;scalable classification for large dynamic networks;CINTIA: a distributed, low-latency index for big interval data;inferring crowd-sourced venues for tweets;revealing the fog-of-war: a visualization-directed, uncertainty-aware approach for exploring high-dimensional data;and visual analysis of bi-directional movement behavior.
We define a heterogeneous dataset as a set of complex objects, that is, those defined by several data types including structured data, images, free text or time series. We envisage this could be extensible to other da...
详细信息
ISBN:
(纸本)9783319210247;9783319210230
We define a heterogeneous dataset as a set of complex objects, that is, those defined by several data types including structured data, images, free text or time series. We envisage this could be extensible to other data types. There are currently research gaps in how to deal with such complex data. In our previous work, we have proposed an intermediary fusion approach called SMF which produces a pairwise matrix of distances between heterogeneous objects by fusing the distances between the individual data types. More precisely, SMF aggregates partial distances that we compute separately from each data type, taking into consideration uncertainty. Consequently, a single fused distance matrix is produced that can be used to produce a clustering using a standard clustering algorithm. In this paper we extend the practical work by evaluating SMF using the k-means algorithm to cluster heterogeneous data. We used a dataset of prostate cancer patients where objects are described by two basic data types, namely: structured and time-series data. We assess the results of clustering using external validation on multiple possible classifications of our patients. The result shows that the SMF approach can improved the clustering configuration when compared with clustering on an individual data type.
Echocardiography (Echo) reports of the patients with pediatric heart disease contain many disease related information, which provide great support to physicians for clinical decision. Such as treatment customization b...
详细信息
ISBN:
(数字)9783319258164
ISBN:
(纸本)9783319258164;9783319258157
Echocardiography (Echo) reports of the patients with pediatric heart disease contain many disease related information, which provide great support to physicians for clinical decision. Such as treatment customization based on the risk level of the specific patient. With the help of natural language processing (NLP), information can be automatically extracted from free-text reports. Those structured data is much easier to analyze with the existing datamining approaches. In this study, we extract the entity/anatomic site-feature-value (EFV) triples in the Echo reports and predict the risk level on this basis. The prediction accuracy of machinelearning and rule-based method are compared based on a manual prepared ideal data, to explore the application of automatic knowledge extraction on clinical decision support.
datamining emerges in response to technological advances and considers the treatment of large amounts of data. The aim of datamining is the extraction of new, valid, comprehensible and useful knowledge by the constr...
详细信息
ISBN:
(数字)9783319273402
ISBN:
(纸本)9783319273402;9783319273396
datamining emerges in response to technological advances and considers the treatment of large amounts of data. The aim of datamining is the extraction of new, valid, comprehensible and useful knowledge by the construction of a simple model that describes the data and can also be used in prediction tasks. The challenge of extracting knowledge from data is an interdisciplinary discipline and draws upon research in statistics, patternrecognition and machinelearning among others. A common technique for identifying natural groups hidden in data is clustering. Clustering is a process that automatically discovers structure in data and does not require any supervision. The model's performance relies heavily on the choice of an appropriate measure. It is important to use the appropriate similarity metric to measure the proximity between two objects, but the separability of clusters must also be taken into account. This paper addresses the problem of comparing two or more sets of overlapping data as a basis for comparing different partitions of quantitative data. An approach that uses statistical concepts to measure the distance between partitions is presented. The data's descriptive knowledge is expressed by means of a boxplot that allows for the construction of clusters taking into account conditional probabilities.
In machinelearning area, as the number of labeled input samples becomes very large, it is very difficult to build a classification model because of input data set is not fit in a memory in training phase of the algor...
详细信息
ISBN:
(数字)9783319265353
ISBN:
(纸本)9783319265353;9783319265346
In machinelearning area, as the number of labeled input samples becomes very large, it is very difficult to build a classification model because of input data set is not fit in a memory in training phase of the algorithm, therefore, it is necessary to utilize data partitioning to handle overall data set. Bagging and boosting based data partitioning methods have been broadly used in datamining and patternrecognition area. Both of these methods have shown a great possibility for improving classification model performance. This study is concerned with the analysis of data set partitioning with noise removal and its impact on the performance of multiple classifier models. In this study, we propose noise filtering preprocessing at each data set partition to increment classifier model performance. We applied Gini impurity approach to find the best split percentage of noise filter ratio. The filtered sub data set is then used to train individual ensemble models.
This book constitutes the refereed proceedings of the 14th internationalconference of the Italian Association for Artificial Intelligence, A*IA 2015, held in Ferrara, Italy, in September *** 35 full papers presented ...
详细信息
ISBN:
(数字)9783319243092
ISBN:
(纸本)9783319243085
This book constitutes the refereed proceedings of the 14th internationalconference of the Italian Association for Artificial Intelligence, A*IA 2015, held in Ferrara, Italy, in September *** 35 full papers presented were carefully reviewed and selected from 44 submissions. The papers are organized in topical sections on swarm intelligence and genetic algorithms; computer vision; multi-agents systems; knowledge representation and reasoning; machinelearning; semantic Web; natural language; and scheduling, planning and robotics.
Heart failure comes in the top causes of death worldwide. The number of deaths from heart failure exceeds the number of deaths resulting from any other causes. Recent studies have focused on the use of machine learnin...
详细信息
ISBN:
(纸本)9781479956807
Heart failure comes in the top causes of death worldwide. The number of deaths from heart failure exceeds the number of deaths resulting from any other causes. Recent studies have focused on the use of machinelearning techniques to develop predictive models that are able to predict the incidence of heart failure. The majority of these studies have used a binary output class, in which the prediction would be either the presence or absence of heart failure. In this study, a multi-level risk assessment of developing heart failure has been proposed, in which a five risk levels of heart failure can be predicted using C4.5 decision tree classifier. On the other hand, we are boosting the early prediction of heart failure through involving three main risk factors with the heart failure data set. Our predictive model shows an improvement on existing studies with 86.5% sensitivity, 95.5% specificity, and 86.53% accuracy.
The proceedings contain 27 papers. The special focus in this conference is on Statistical Language and Speech Processing. The topics include: Towards two-way interaction with reading machines;the prediction of fatigue...
ISBN:
(纸本)9783319257884
The proceedings contain 27 papers. The special focus in this conference is on Statistical Language and Speech Processing. The topics include: Towards two-way interaction with reading machines;the prediction of fatigue using speech as a biosignal;supertagging for a statistical HPSG parser for spanish;residual-based excitation with continuous F0 modeling in HMM-based speech synthesis;discourse particles in french;effects of evolutionary linguistics in text classification;evaluation of the impact of corpus phonetic alignment on the HMM-based speech synthesis quality;decoding distributed tree structures;combining continuous word representation and prosodic features for ASR error prediction;semi-extractive multi-document summarization via submodular functions;a comparison of human and machine estimation of speaker age;acoustical frame rate and pronunciation variant statistics;the influence of boundary depth on phrase-final lengthening in russian;automatic detection of voice disorders;semantic features for dialogue act recognition;conversational telephone speech recognition for lithuanian;long-term statistical feature extraction from speech signal and its application in emotion recognition;rhythm-based syllabic stress learning without labelled data;unsupervised and user feedback based lexicon adaptation for foreign names and acronyms;combining lexical and prosodic features for automatic detection of sentence modality in french;corpus based methods for learning models of metaphor in modern greek;probabilistic speaker pronunciation adaptation for spontaneous speech synthesis using linguistic features;weakly supervised discriminative training of linear models for natural language processing and merging of native and non-native speech for low-resource accented ASR.
暂无评论