Based on the methods of the traditional topic-based text classification, machine learning method was performed to the coarse-grained sentiment classification of reviews. Sentiment classification involved a lot of prob...
详细信息
ISBN:
(纸本)9781509019151
Based on the methods of the traditional topic-based text classification, machine learning method was performed to the coarse-grained sentiment classification of reviews. Sentiment classification involved a lot of problems. In this paper, the sentiment Vector Space Model (s-VSM) was used for text representation to solve data sparseness. In addition, the critical issues of the sentiment classification, i.e. the selection of classification algorithms, the determination of feature selection method and the selection of feature dimension, are verified by experiments. Furthermore, in order to consider the entire corpus contribution of features and each category contribution of features, the feature selection method of Chi-square Difference between the Positive and Negative Categories (CDPNC) was proposed. It combined DF with CHI and had the better performance. Experiments showed that the Macro-F and Micro-F achieved 90.18% and 90.08% respectively.
With the completion of the human genome sequencing, a large number of data especially amino acid sequences floods into biological database, How to analyze these data quickly and even predict the structure and function...
详细信息
ISBN:
(纸本)9781467399043
With the completion of the human genome sequencing, a large number of data especially amino acid sequences floods into biological database, How to analyze these data quickly and even predict the structure and function of protein correctly have become hot topics in recent years, In this paper, we mainly study K-means clustering algorithm and KNN classifier in amino acid sequences of complicated data, which are applied in the prediction of protein sub-cellular localization. In many cases, fuzzy boundary and unbalance are frequently appeared among biological data. The accuracy will be lower, if we make a prediction through traditional KNN and K-means clustering algorithm directly. Firstly, in order to make clear the unbalance, we propose the within-class thought to make sure that training samples in each class around the testing sample are selected and we introduce membership to tell which class the testing sample belongs to. Then, we bring in rough sets and membership to solve the fuzzy boundary. Particularly, we apply correlation coefficient in the rough sets to better reflect the relationship among data objects. The experimental results based on protein sub-cellular localization prediction show that the methods proposed newly better work than the traditional methods.
Control charts have been widely used to improve manufacturing processes by reducing variations and defects. In particular, multivariate control charts have been effectively applied with monitoring processes that conta...
详细信息
Control charts have been widely used to improve manufacturing processes by reducing variations and defects. In particular, multivariate control charts have been effectively applied with monitoring processes that contain many correlated variables. Most existing multivariate control charts are vulnerable to mis-classification errors that originate because of the hypothesis tests. In particular, these often cause the generation of a large number of false alarms. In this paper, we propose a procedure to reduce false alarms by combining a multivariate control chart and data mining algorithms. Simulation and real case studies demonstrate that the proposed method effectively reduces the false alarm rate. (C) 2015 Elsevier Ltd. All rights reserved.
The natural cork stoppers are commercially graded into quality classes according with the homogeneity of the external surface. The underlying criteria for this classification are subjective without quantified criteria...
详细信息
The natural cork stoppers are commercially graded into quality classes according with the homogeneity of the external surface. The underlying criteria for this classification are subjective without quantified criteria and standards defined by cork industry or consumers. Image analysis was applied to premium, good and standard quality classes to characterize the surface of the cork stoppers and stepwise discriminant analysis (SDA) was used to build predictive classification models. The final goal is to analyze the contribution of each porosity feature and propose an algorithm for cork stoppers quality class classification. This study provides the knowledge based on a large sampling to an accurate grading of natural cork stoppers. In average all the models presented accuracy in relation to the commercial classification over 68% with a higher mismatch in the mid-quality range. Color showed an important discriminating power, increasing the accuracy in 10%. The main discriminant features were porosity coefficient and color variables, calculated for the lateral surface. A quality classification algorithm was presented based on a simplified model with an accuracy of 75%. The classification based on color vision systems can ensure improved quality class uniformity and a higher transparency in trade. (C) 2013 The Institution of Chemical Engineers. Published by Elsevier B.V. All rights reserved.
An unsupervised classification algorithm utilising both polarimetric scattering mechanisms (PSMs) of hybrid-polarity data and the Wishart classifier is proposed. The initial scattering categories of the proposed algor...
详细信息
An unsupervised classification algorithm utilising both polarimetric scattering mechanisms (PSMs) of hybrid-polarity data and the Wishart classifier is proposed. The initial scattering categories of the proposed algorithm are derived from the roll-invariant m-chi classification algorithm. Pixels with no clearly defined dominant PSM are excluded, and the resulting categories are expanded into a specified number of classes. These derived classes are taken as training samples of the Wishart classifier. The effectiveness of the proposed algorithm is validated with the dataset over San Francisco.
In this paper, the Extended Coupled Amplitude Delay Lock Loop (ECADLL) architecture, previously introduced as a solution able to deal with a multipath environment, is revisited and improved to tailor it to spoofing de...
详细信息
In this paper, the Extended Coupled Amplitude Delay Lock Loop (ECADLL) architecture, previously introduced as a solution able to deal with a multipath environment, is revisited and improved to tailor it to spoofing detection purposes. Exploiting a properly-defined decision algorithm, the architecture is able to effectively detect a spoofer attack, as well as distinguish it from other kinds of interference events. The new algorithm is used to classify them according to their characteristics. We also introduce the use of a ratio metric detector in order to reduce the detection latency and the computational load of the architecture.
Global Positioning System (GPS) technologies have been increasingly considered as an alternative to traditional travel survey methods to collect activity-travel data. algorithms applied to extract activity-travel patt...
详细信息
Global Positioning System (GPS) technologies have been increasingly considered as an alternative to traditional travel survey methods to collect activity-travel data. algorithms applied to extract activity-travel patterns vary from informal ad-hoc decision rules to advanced machine learning methods and have different accuracy. This paper systematically compares the relative performance of different algorithms for the detection of transportation modes and activity episodes. In particular, naive Bayesian, Bayesian network, logistic regression, multilayer perceptron, support vector machine, decision table, and C4.5 algorithms are selected and compared for the same data according to their overall error rates and hit ratios. Results show that the Bayesian network has a better performance than the other algorithms in terms of the percentage correctly identified instances and Kappa values for both the training data and test data, in the sense that the Bayesian network is relatively efficient and generalizable in the context of GPS data imputation.
Recent developments in on-board technology have enabled automatic collection of follow-up data on forwarder work. The objective of this study was to exploit this possibility to obtain highly representative information...
详细信息
Recent developments in on-board technology have enabled automatic collection of follow-up data on forwarder work. The objective of this study was to exploit this possibility to obtain highly representative information on time consumption of specific work elements (including overlapping crane work and driving), with one load as unit of observation, for large forwarders in final felling operations. The data used were collected by the John Deere TimberLink system as nine operators forwarded 8868 loads, in total, at sites in mid-Sweden. Load-sizes were not available. For the average and median extraction distances (219 and 174 m, respectively), Loading, Unloading, Driving empty, Driving loaded and Other time effective work (PM) accounted for ca. 45, 19, 8.5, 7.5 and 14% of total forwarding time consumption, respectively. The average and median total time consumptions were 45.8 and 42.1 minutes/load, respectively. The developed models explained large proportions of the variation of time consumption for the work elements Driving empty and Driving loaded, but minor proportions for the work elements Loading and Unloading. Based on the means, the crane was used during 74.8% of Loading PM time, the driving speed was nonzero during 31.9% of the Loading PM time, and Simultaneous crane work and driving occurred during 6.7% of the Loading PM time. Time consumption per load was more strongly associated with Loading drive distance than with extraction distance, indicating that the relevance of extraction distance as a main indicator of forwarding productivity should be re-considered.
Heart is the most vital organ which circulates blood along with nutrients and oxygen throughout the body. There are number of reasons which may affect its normal working. In this paper ten heart diseases, as well as n...
详细信息
ISBN:
(纸本)9781479988907
Heart is the most vital organ which circulates blood along with nutrients and oxygen throughout the body. There are number of reasons which may affect its normal working. In this paper ten heart diseases, as well as normal, have been classified by extracting features from original ECG (electrocardiogram) signals and sixth level wavelet transformed ECG signals. The results have been compared and improved accuracy has been obtained using wavelet transformed signals.
Data mining (DM) is the process of discovery knowledge from large databases. Applications of data mining in Blood Transfusion Organizations could be useful for improving the performance of blood donation service. The ...
详细信息
ISBN:
(纸本)9781628415582
Data mining (DM) is the process of discovery knowledge from large databases. Applications of data mining in Blood Transfusion Organizations could be useful for improving the performance of blood donation service. The aim of this research is the prediction of healthiness of blood donors in Blood Transfusion Organization (BTO). For this goal, three famous algorithms such as Decision Tree C4.5, Naive Bayesian classifier, and Support Vector Machine have been chosen and applied to a real database made of 11006 donors. Seven fields such as sex, age, job, education, marital status, type of donor, results of blood tests (doctors' comments and lab results about healthy or unhealthy blood donors) have been selected as input to these algorithms. The results of the three algorithms have been compared and an error cost analysis has been performed. According to this research and the obtained results, the best algorithm with low error cost and high accuracy is SVM. This research helps BTO to realize a model from blood donors in each area in order to predict the healthy blood or unhealthy blood of donors. This research could be useful if used in parallel with laboratory tests to better separate unhealthy blood.
暂无评论