this paper proposes a learning approach for discovering the semantic structure of web pages. the task includes partitioning the text on a web page into information blocks and identifying their semantic categories. We ...
详细信息
ISBN:
(纸本)0769524206
this paper proposes a learning approach for discovering the semantic structure of web pages. the task includes partitioning the text on a web page into information blocks and identifying their semantic categories. We employed two machinelearning techniques, Adaboost and SVMs, to learn from a labeled web page corpus. We evaluated our approach on general web pages from the World Wide Web and obtained encouraging results. this work can be beneficial to a number of web-driven applications such as search engines, web-based question answering, web-based datamining as well as voice enabled web navigation.
A general automatic method for clinical image segmentation is proposed. Tailored for the clinical environment, the proposed segmentation method consists of two stages: a learning stage and a clinical segmentation stag...
详细信息
ISBN:
(纸本)3540269231
A general automatic method for clinical image segmentation is proposed. Tailored for the clinical environment, the proposed segmentation method consists of two stages: a learning stage and a clinical segmentation stage. During the learning stage, manually chosen representative images are segmented using a variational level set method driven by a pathologically modelled energy functional. then a window-based feature extraction is applied to the segmented images. Principal component analysis (PCA) is applied to these extracted features and the results are used to train a support vector machine (SVM) classifier. During the clinical segmentation stage, the input clinical images are classified withthe trained SVM. By the proposed method, we take the strengths of bothmachinelearning and variational level set while limiting their weaknesses to achieve automatic and fast clinical segmentation. Both chest (thoracic) computed tomography (CT) scans (2D and 3D) and dental X-rays are used to test the proposed method. Promising results are demonstrated and analyzed. the proposed method can be used during preprocessing for automatic computer aided diagnosis.
this paper aims to take general tensors as inputs for supervised learning. A supervised tensor learning (STL) framework is established for convex optimization based learning techniques such as support vector machines ...
详细信息
ISBN:
(纸本)0769522785
this paper aims to take general tensors as inputs for supervised learning. A supervised tensor learning (STL) framework is established for convex optimization based learning techniques such as support vector machines (SVM) and minimax probability machines (M P M). Within the STL framework, many conventional learningmachines can be generalized to take n(th)-order tensors as inputs. We also study the applications of tensors to learningmachine design and feature extraction by linear discriminant analysis (LDA). Our method for tensor based feature extraction is named the tenor rank-one discriminant analysis (TRIDA). these generalized algorithms have several advantages: 1) reduce the curse of dimension problem in machinelearning and datamining;2) avoid the failure to converge;and 3) achieve better separation between the different categories of samples. As an example, we generalize MPM to its STL version, which is named the tensor MPM (TMPM). TMPM learns a series of tensor projections iteratively. It is then evaluated against the original MPM. Our experiments on a binary classification problem show that TMPM significantly outperforms the original MPM.
In this paper, linear and unsupervised dimensionality reduction via matrix factorization with nonnegativity constraints is studied when applied for feature extraction, followed by patternrecognition. Since typically ...
详细信息
ISBN:
(纸本)0889865280
In this paper, linear and unsupervised dimensionality reduction via matrix factorization with nonnegativity constraints is studied when applied for feature extraction, followed by patternrecognition. Since typically matrix factorization is iteratively lone, convergence can be slow. To alleviate this problem, a significantly (more than 11 times) faster algorithm is proposed, which does not cause severe degradations in classification accuracy when dimensionality reduction is followed by classification. Such results are due to two modifications of the previous algorithms: feature scaling (normalization) prior to the beginning of iterations and initialization of iterations, combining two techniques for mapping unseen data.
In this paper there will he presented the new opportunities for applying linguistic algorithms of patternrecognition for computer understanding of image semantic content in intelligent information systems. A successf...
详细信息
ISBN:
(纸本)0769522866
In this paper there will he presented the new opportunities for applying linguistic algorithms of patternrecognition for computer understanding of image semantic content in intelligent information systems. A successful obtaining of the crucial semantic information of the image - especially medical - may contribute considerably to the creation of new intelligent cognitive information systems. thanks to the new algorithms of cognitive resonance between stream of the data extracted from the image and expectations taken from the representation of the medical knowledge, we can understand the merit content of the image even if the form of the image is very different from any known pattern. It seems that in the near future the technique of automatic understanding of images may become one of the effective tools for semantic interpreting, and intelligent storing of the visual data in scattered databases. In this article we will try proving that structural techniques may be applied in the case of tasks related to automatic classification and machine perception of the semantic meaning of selected classes of medical patterns.
data classification is an important topic in datamining field due to the wide applications. A number of related methods have been proposed based on the well-known learning models like decision tree or neural network....
详细信息
ISBN:
(纸本)9780898715934
data classification is an important topic in datamining field due to the wide applications. A number of related methods have been proposed based on the well-known learning models like decision tree or neural network. However, these kinds of classification methods may not perform well in mining time sequence datasets like time-series gene expression data. In this paper, we propose a new datamining method, namely Classifr-By-Sequence (CBS), for classifying large time-series datasets. the main methodology of CBS method is to integrate the sequential patternmining withthe probabilistic induction such that the inherent sequential patterns can be extracted efficiently and the classification task be done more accurately. Meanwhile, CBS method has the merit of simplicity in implementation. through experimental evaluation, the CBS method is shown to outperform other methods greatly in the classification accuracy.
Derivative free optimization methods have recently gained a lot of attractions for neural learning. the curse of dimensionality for the neural learning problem makes local optimization methods very attractive;however ...
详细信息
ISBN:
(纸本)3540269231
Derivative free optimization methods have recently gained a lot of attractions for neural learning. the curse of dimensionality for the neural learning problem makes local optimization methods very attractive;however the error surface contains many local minima. Discrete gradient method is a special case of derivative free methods based on bundle methods and has the ability to jump over many local minima. there are two types of problems that are associated withthis when local optimization methods are used for neural learning. the first type of problems is initial sensitivity dependence problem - that is commonly solved by using a hybrid model. Our early research has shown that discrete gradient method combining with other global methods such as evolutionary algorithm makes them even more attractive. these types of hybrid models have been studied by other researchers also. Another less mentioned problem is the problem of large weight values for the synaptic connections of the network. Large synaptic weight values often lead to the problem of paralysis and convergence problem especially when a hybrid model is used for fine tuning the learning task. In this paper we study and analyse the effect of different regularization parameters for our objective function to restrict the weight values without compromising the classification accuracy.
Discretization is the process of converting the continuous attributes of the database into discrete ones in order to apply some classification algorithms. this is an important problem in developing generally applicabl...
详细信息
ISBN:
(纸本)9780898715934
Discretization is the process of converting the continuous attributes of the database into discrete ones in order to apply some classification algorithms. this is an important problem in developing generally applicable methods in machinelearning and datamining for classification and prediction. this paper intorduces a new technique for discretization based on successive pseudo deletion of instances to reduce the conflicting instances, i.e., by reduction of noise in the database. Such successive pseudo deletions in the database are performed by introducing threshold points on maximum information gain boundary points of the continuous attributes. Our empirical experiments show that the state of the art algorithms for learning, such as CN2, C4.5, Naive-Bayes and RISE give improvement in performances withthe cliscretized output from our method than outputs with other state of the art discretization algorithms.
We focus on the problem of prediction with confidence and describe a recently developed learning algorithm called transductive confidence machine for making qualified region predictions. Its main advantage, in compari...
详细信息
We focus on the problem of prediction with confidence and describe a recently developed learning algorithm called transductive confidence machine for making qualified region predictions. Its main advantage, in comparison with other classifiers, is that it is well-calibrated, with number of prediction errors strictly controlled by a given predefined confidence level. We apply the transductive confidence machine to the problems of acute leukaemia and ovarian cancer prediction using microarray and proteomics pattern diagnostics, respectively. We demonstrate that the algorithm performs well, yielding well-calibrated and informative predictions whilst maintaining a high level of accuracy.
暂无评论