Decision tree is an important method for both induction research and datamining, which is mainly used for model classification and prediction. WEKA is software which is capable of doing work on various decision tree ...
详细信息
A decision support system based on datamining (DM) and Bayesian belief networks (BBN) is proposed to predict the student learning outcomes and takes the calculus course as an example to help students overcome their l...
详细信息
A decision support system based on datamining (DM) and Bayesian belief networks (BBN) is proposed to predict the student learning outcomes and takes the calculus course as an example to help students overcome their learning difficulties. Total of 427 freshmen in Ming Chi University of Technology (Taiwan) did questionnaires to assist this study. The methodologies involves four steps: fuzzy theory to identify the factors on learning outcomes;datamining to construct influence diagram;machinelearning to establish the probability tables in BBN;and the model to predict the exam scores at the beginning of course and thereby to help students enhance their scores according to their weakness.
There is a great number of online product reviews on the Internet which needs to be organized. In this paper, we consider the problem of sentiment classification of online reviews to determine the overall semantic ori...
详细信息
There is a great number of online product reviews on the Internet which needs to be organized. In this paper, we consider the problem of sentiment classification of online reviews to determine the overall semantic orientation of customer reviews. Our proposed method for review classification is a supervised machinelearning method based on extracting product features and the polarity of opinions expressed about the features.
Nowadays, the huge amount of information distributed through the Web motivates studying techniques to be adopted in order to extract relevant data in an efficient and reliable way. Both academia and enterprises develo...
详细信息
ISBN:
(纸本)9789898425409
Nowadays, the huge amount of information distributed through the Web motivates studying techniques to be adopted in order to extract relevant data in an efficient and reliable way. Both academia and enterprises developed several approaches of Web data extraction, for example using techniques of artificial intelligence or machinelearning. Some commonly adopted procedures, namely wrappers, ensure a high degree of precision of information extracted from Web pages, and, at the same time, have to prove robustness in order not to compromise quality and reliability of data themselves. In this paper we focus on some experimental aspects related to the robustness of the data extraction process and the possibility of automatically adapting wrappers. We discuss the implementation of algorithms for finding similarities between two different version of a Web page, in order to handle modifications, avoiding the failure of data extraction tasks and ensuring reliability of information extracted. Our purpose is to evaluate performances, advantages and draw-backs of our novel system of automatic wrapper adaptation.
Twitter is the second largest social network after Facebook and currently 140 millions Tweets are posted on average each day. Tweets are messages with a maximum number of 140 characters and cover all imaginable storie...
详细信息
ISBN:
(纸本)9783642233333
Twitter is the second largest social network after Facebook and currently 140 millions Tweets are posted on average each day. Tweets are messages with a maximum number of 140 characters and cover all imaginable stories ranging from simple activity updates over news coverage to opinions on arbitrary topics. In this work we argue that Twitter is a valuable data source for e-Participation related projects and describe other domains were Twitter has already been used. We then focus on our own semantic-analysis framework based on our previously introduced Semantic patterns concept. In order to highlight the benefits of semantic knowledge extraction for Twitter related e-Participation projects, we apply the presented technique to Tweets covering the protests in Egypt starting at January 25th and resulting in the ousting of Hosni Mubarak on February 11th 2011. Based on these results and the lessons learned from previous knowledge extraction tasks, we identify key requirements for extracting semantic knowledge from Twitter.
Developing an effective medical diagnosis system for many diseases, such as thyroid gland disease, to assist physicians in hospitals has become a high priority for many researchers and clinical centers. In fact, exist...
详细信息
Developing an effective medical diagnosis system for many diseases, such as thyroid gland disease, to assist physicians in hospitals has become a high priority for many researchers and clinical centers. In fact, existing medical diagnostic techniques often have to diagnose the risk of misdiagnosis. The purpose of this paper is to develop an efficient classifier to improve medical diagnosis performance of thyroid gland disease. In this work, the medical dataset of thyroid gland disease that represent multiclass classification problem was selected from the University of California Irvine machinelearning Repository. The proposed approach combined support vector machines with an artificial immune system as the diagnostic classifier, which is called the AIS-based machinelearning classifier. The diagnosis results were identified, and the accuracies of the classification rate were evaluated. The classification results demonstrated that the proposed approach can give considerable improvements over those reported in previous studies.
Feature Selection in High-Dimensional Imbalanced dataset (where one class outnumbers the other class) plays an imperative task in field of datamining and Bio-informatics. This paper proposes a new technique called E-...
详细信息
Feature Selection in High-Dimensional Imbalanced dataset (where one class outnumbers the other class) plays an imperative task in field of datamining and Bio-informatics. This paper proposes a new technique called E-SMOTE Technique for balancing the dataset and SVM classification for selecting the features. It is evaluated using micro array dataset.
We address the problem of estimating changes in diffusion probability over a social network from the observed information diffusion results, which is possibly caused by an unknown external situation change. For this p...
详细信息
Most feed-forward artificial neural network training algorithms for classification problems are based on an iterative steepest descent technique. Their well-known drawback is slow convergence. A fast solution is an Ex...
详细信息
Most feed-forward artificial neural network training algorithms for classification problems are based on an iterative steepest descent technique. Their well-known drawback is slow convergence. A fast solution is an Extreme learningmachine (ELM) computing the Moore-Penrose inverse using SVD. However, the most significant training time is pseudo-inverse computing. Thus, this paper proposes two fast solutions to pseudo-inverse computing based on QR with pivoting and Fast General Inverse algorithms. They are QR-ELM and GENINV-ELM, respectively. The benchmarks are conducted on 5 standard classification problems, i.e., diabetes, satellite images, image segmentation, forest cover type and sensit vehicle (combined) problems. The experimental results clearly showed that both QR-ELM and GENINV-ELM can speed up the training time of ELM and the quality of their solutions can be compared to that of the original ELM. They also show that QR-ELM is more robust than GENINV-ELM.
Feature selection in high-dimensional Imbalanced dataset (where one class highly outnumbers the other class) is an exigent task in datamining. Feature selection refers to selecting a subset of features from the origi...
详细信息
Feature selection in high-dimensional Imbalanced dataset (where one class highly outnumbers the other class) is an exigent task in datamining. Feature selection refers to selecting a subset of features from the original dataset. This paper focus on two problems i) Balancing the dataset ii) extracting the features. A new technique called Evolutionary sampling technique [EST] is developed to balance the dataset and Support Vector machine [SVM] classification is used to calculate the accuracy and also to overcome the over fitting problem while sampling the dataset. The techniques are evaluated on a micro array dataset.
暂无评论