Data extraction in Web is to obtain the desired information to users in Web pages. For a more accurately valuable data extraction, this paper proposes a new method called data extraction based on index path in Web (DE...
详细信息
The regularization parameter and kernel parameter play important roles in the performance of the least squares support vector machine (LS-SVM). Aimed at optimizing the LS-SVM's parameters, a fast method based on d...
详细信息
The phenomenon of person name ambiguity is widespread on web pages in that one name may be used by different people. It is important to uniquely identify the given person on the web. In this paper, the method Baidu-PN...
详细信息
College English Test Band Four (CET4) in China has been a significant impact on evaluating the English preliminary level of a college student or a class. How to improve the college English teaching and go further to r...
详细信息
The performances of semisupervised clustering for unlab.led data are often superior to those of unsupervised learning,which indicates that semantic information attached to clusters can significantly improve feature re...
详细信息
The performances of semisupervised clustering for unlab.led data are often superior to those of unsupervised learning,which indicates that semantic information attached to clusters can significantly improve feature representation *** a graph convolutional network(GCN),each node contains information about itself and its neighbors that is beneficial to common and unique features among *** these findings,we propose a deep clustering method based on GCN and semantic feature guidance(GFDC) in which a deep convolutional network is used as a feature generator,and a GCN with a softmax layer performs clustering ***,the diversity and amount of input information are enhanced to generate highly useful representations for downstream ***,the topological graph is constructed to express the spatial relationship of *** a pair of datasets,feature correspondence constraints are used to regularize clustering loss,and clustering outputs are iteratively *** external evaluation indicators,i.e.,clustering accuracy,normalized mutual information,and the adjusted Rand index,and an internal indicator,i.e., the Davidson-Bouldin index(DBI),are employed to evaluate clustering *** results on eight public datasets show that the GFDC algorithm is significantly better than the majority of competitive clustering methods,i.e.,its clustering accuracy is20% higher than the best clustering method on the United States Postal Service *** GFDC algorithm also has the highest accuracy on the smaller Amazon and Caltech ***,DBI indicates the dispersion of cluster distribution and compactness within the cluster.
The fuzzy k-nearest neighbor (F-KNN) algorithm was originally developed by Keller in 1985, which generalized the k-nearest neighbor (KNN) algorithm and could overcome the drawback of KNN in which all of instances were...
详细信息
Active learning is a hot topic in machinelearning field. The main task of active learning is to automatically select the representative instances for efficiently reducing the sample complexity. This paper presents a ...
详细信息
In this paper, an improved cluster oriented decision trees algorithm shortly named ICFDT is presented. In this algorithm, fuzzy C-means clustering algorithm (FCM) without instance lab.ls is used to split the nodes and...
详细信息
The NN algorithm is a simple and well-known supervised learning scheme which classifies an unseen instance by finding its closest neighbor in training set. The main drawback of NN is that the whole training set must b...
详细信息
Email is a kind of semi-structured document, some important attributes are contained in its structure, and especially using spam-specific features could improve the email classification results. In this paper, we appl...
详细信息
暂无评论