Group work is widespread in education. the growing use of, online tools supporting group work generates huge amounts of data. We aim to exploit this data to support mirroring: presenting useful high-level views of inf...
详细信息
Group work is widespread in education. the growing use of, online tools supporting group work generates huge amounts of data. We aim to exploit this data to support mirroring: presenting useful high-level views of information about the group, together with desired patterns characterizing the behavior of strong groups. the goal is to enable the groups and their facilitators to see relevant aspects of the group's operation and provide feedback if these are more likely to be associated with positive or negative outcomes and indicate where the problems are. We explore how useful mirror information can be extracted via a theory-driven approach and a range of clustering and sequential patternmining. the context is a senior software development project where students use the collaboration tool TRAC. We extract patterns distinguishing the better from the weaker groups and get insights in the success factors. the results point to the importance of leadership and group interaction, and give promising indications if they are occurring. patterns indicating good individual practices were also identified. We found that some key measures can be mined from early data. the results are promising for advising groups at the start and early identification of effective and poor practices, in time for remediation.
Integration methods for ensemble learning can use two different approaches: combination or selection. the combination approach (also called fusion) consists on the combination of the predictions obtained by different ...
详细信息
ISBN:
(纸本)9783642030697
Integration methods for ensemble learning can use two different approaches: combination or selection. the combination approach (also called fusion) consists on the combination of the predictions obtained by different models in the ensemble to obtain the final ensemble predication. the selection approach selects one (or more) models from the ensemble according to the prediction performance of these models on similar data from the validation set. Usually, the method to select similar data is the k-nearest neighbors withthe Euclidean distance. In this paper we discuss other approaches to obtain similar data for the regression problem. We show that using similarity measures according to the target values improves results. We also show that selecting dynamically several models for the prediction task increases prediction accuracy comparing to the selection of just one model.
Although a vast majority of inductive learning algorithms has been developed for handling of the concept drifting data streams, especially the ones in Wine of ensemble classification models, few of them could adapt to...
详细信息
ISBN:
(纸本)9783642030697
Although a vast majority of inductive learning algorithms has been developed for handling of the concept drifting data streams, especially the ones in Wine of ensemble classification models, few of them could adapt to Hie detection oil the different types of concept drifts from noisy streaming data in a demand on overheads of time and space. Motivated by this, a new classification algorithm for Concept drifting Detection based on an ensembling model of Random Decision Trees (called CDRDT) is proposed in this paper. Extensive studies with synthetic and real streaming dam demonstrate that in comparison to several classification algorithms for concept drifting data streams, CDRDT not only could effectively and efficiently detect the potential concept changes in the noisy data streams, but also performs much better oil the abilities of runtime and space with an improvement in predictive accuracy. thus, our proposed algorithm provides a significant reference to the classification for concept drifting data streams with noise in a light, weight way.
While for many problems in medicine classification models are being developed, Bayesian network classifiers do not seem to have become is widely accepted within the medical community as logistic regression models. We ...
详细信息
ISBN:
(纸本)9783642030697
While for many problems in medicine classification models are being developed, Bayesian network classifiers do not seem to have become is widely accepted within the medical community as logistic regression models. We compare first-order logistic regression and naive Bayesian classification in the domain of reproductive medicine and demonstrate that the two techniques can result in models of comparable performance. For Bayesian network classifiers to become more widely accepted within the medical community, we feel that they should be better aligned withtheir context of application. We describe how to incorporate well-known concepts of clinical relevance in the process Of Constructing and evaluating Bayesian network classifiers to achieve Such an alignment.
data clustering has been applied in multiple fields such as machinelearning, datamining, wireless sensor networks and patternrecognition. One of the most famous clustering approaches is K-means which effectively ha...
详细信息
ISBN:
(纸本)9781424481835
data clustering has been applied in multiple fields such as machinelearning, datamining, wireless sensor networks and patternrecognition. One of the most famous clustering approaches is K-means which effectively has been used in many clustering problems, but this algorithm has some problems such as local optimal convergence and initial point sensitivity. Artificial fishes swarm algorithm (AFSA) is one of the swarm intelligent algorithms and its major application is in solving optimization problems. Of its characteristics, it can refer to high convergent rate and insensitivity to initial values. In this paper a hybrid clustering method based on artificial fishes swarm algorithm and K-means so called KAFSA is proposed. In the proposed algorithm, K-means algorithm is used as one of the behaviors of artificial fishes in AFSA. the proposed algorithm has been tested on five data sets and its efficiency was compared with particle swarm optimization (PSO), K-means and standard AFSA algorithms. Experimental results showed that proposed approach has suitable and acceptable efficacy in data clustering.
We consider the problem of learning classifiers from samples which have additional features that are absent due to noise or corruption of measurement. the common approach for handling missing features in discriminativ...
详细信息
ISBN:
(纸本)9783642030697
We consider the problem of learning classifiers from samples which have additional features that are absent due to noise or corruption of measurement. the common approach for handling missing features in discriminative models is first to complete their unknown values, anti then a standard classification algorithm is employed over the completed data. In this paper, an algorithm which aims to maximize the margin of each sample in its own relevant subspace is proposed. We show how incomplete data can be classified directly without completing any missing features in a large-margin learning framework. Moreover, according to the theory of optimal kernel function, we proposed an optimal kernel function which is a convex composition of a set of linear kernel function to measure the similarity between additional features of each two samples. Based on the geometric interpretation of the margin, we formulate an objective function to maximize the margin of each sample in its own relevant subspace. In this formulation. we make use of the Structural parameters trained front existing features and optimize the structural parameters trained front additional features only. A two-step iterative procedure for solving, the objective function is proposed. By avoiding the pre-processing phase in which the data is completed, our algorithm Could offer considerable computational saving. We demonstrate our results on a number of standard benchmarks from UCI and the results Show that our algorithm can achieve better or comparable classification accuracy compared to the existing algorithms.
In this paper we address the problem of using bet selections of a large number of mostly non-expert users to improve sports betting tips. A similarity based approach is used to describe individual users' strategie...
详细信息
As we know there exist several approaches and algorithms for datamining and machinelearning task solution, for example, decision tree learning, artificial neural networks, Bayesian learning, instance-based learning,...
详细信息
Organ transplantation is a highly complex decision process that requires expert, decisions. the major problem ill a transplantation procedure is the possibility of the receiver's immune system attack and destroy t...
详细信息
ISBN:
(纸本)9783642030697
Organ transplantation is a highly complex decision process that requires expert, decisions. the major problem ill a transplantation procedure is the possibility of the receiver's immune system attack and destroy the transplanted tissue. It is therefore of capital importance to find a donor withthe highest possible compatibility withthe receiver, and thus reduce rejection. Finding a good donor is not a straightforward task because a complex network of relations exist's between the immunological and the clinical variables that, influence the receivers acceptance of the transplanted organ. Currently the process of analyzing these variables involves a careful study by the clinical transplant team. the number and complexity of the relations between variables make the manual process very slow. Ill this paper we propose and compare two machinelearning algorithms that might help the transplant team ill improving and Speeding up their decisions. We achieve that objective by analyzing past real cases and constructing models as set, of rules. Such models are accurate and understandable by experts.
Withthe advent of information age, especially withthe rapid development of network, "information explosion" problem has emerged. How to improve the classifier's training precision steadily with accumul...
详细信息
暂无评论