The proceedings contain 101 papers. The topics discussed include: new approaches to design and control of time limited search algorithms;feature selection using non linear feature relation index;a geometric algorithm ...
ISBN:
(纸本)3642111637
The proceedings contain 101 papers. The topics discussed include: new approaches to design and control of time limited search algorithms;feature selection using non linear feature relation index;a geometric algorithm for learning oblique decision trees;effect of subsampling rate on subbagging and related ensembles of stable classifiers;constructive semi-supervised classification algorithm and its implement in datamining;a fast supervised method of feature ranking and selection for pattern classification;clustering in concept association networks;application of neural networks in preform design of aluminium upsetting process considering different interfacial frictional conditions;incorporating fuzziness to CLARANS;development of a neuro-fuzzy MR image segmentation approach using fuzzy c-means and recurrent neural network;and identification of N-glycosylation sites with sequence and structural features employing random forests.
The new approach of relevant feature selection in machinelearning is proposed for the case of ordered features. Feature selection and regularization of decision rule are combined in a single procedure. The selection ...
详细信息
ISBN:
(纸本)9783642030697
The new approach of relevant feature selection in machinelearning is proposed for the case of ordered features. Feature selection and regularization of decision rule are combined in a single procedure. The selection of features is realized by introducing weight coefficients, characterizing degree of relevance of respective feature. A priori information about feature ordering is taken into account in the form of quadratic penalty or in the form of absolute Value penalty on the difference of weight coefficients of neighboring features. Study of a penalty function in the form of absolute value shows computational complexity Of Such formulation. The effective method of solution is proposed. The brief survey of authors early papers. the mathematical frameworks, and experimental results are provided.
Based on secondary analysis techniques to identify specific sample point using partial least-squares analysis method, the recognition method of specific sample point of two-dimensional floor plan of ellipse T-2 was ex...
详细信息
ISBN:
(纸本)9780769538594
Based on secondary analysis techniques to identify specific sample point using partial least-squares analysis method, the recognition method of specific sample point of two-dimensional floor plan of ellipse T-2 was extended to three-dimensional figure of ellipsoid T-2 and high-dimensional space of hyper- ellipsoid T-2. Another Identification method of specific sample point making use of hierarchical diagram in high-dimensional space based on hierarchical clustering method was proposed at the same time. The recognition method of specific sample point had great significance on research areas of datamining, machinelearning and patternrecognition while eliminating samples generated due to random factors and refining mathematical models. Empirical analysis of the five kinds of identification method was accomplished using ecological data of the 56 observation sites along the Bohai Sea coastal zone.
patternmining derives from the need of discovering hidden knowledge in very large amounts of data, regardless form in which it is presented. When it;comes to Natural Language, Processing (NLP), it arose along the hum...
详细信息
ISBN:
(纸本)9783642030697
patternmining derives from the need of discovering hidden knowledge in very large amounts of data, regardless form in which it is presented. When it;comes to Natural Language, Processing (NLP), it arose along the humans necessity of being understood by computers. In this paper we present, an exploratory approach that aims at bringing together the best of both worlds. Our goal is to patterns in linguistically processed texts, through the usage of NLP state-of-the-art tools and traditional patternmining algorithms. Articles from a Portuguese newspaper are the input of a series of tests described in this paper. First, they are processed by an NLP chain which performs a (feel) linguistic analysis of text: afterwards patternmining algorithms Apriori and GenPrefixSpan, are used. Results showed the applicability of sequential patternmining techniques in textual structured data and also provided Several evidence's about the structure of the language.
In this paper, we propose an efficient algorithm for anomaly detection from call data records. Anomalous users are detected based on fuzzy attribute values derived from their communication patterns. A clustering based...
详细信息
ISBN:
(纸本)9783642111631
In this paper, we propose an efficient algorithm for anomaly detection from call data records. Anomalous users are detected based on fuzzy attribute values derived from their communication patterns. A clustering based algorithm is proposed to generate explanations to assist human analysts in validating the results.
Many data sets derived from the web are large, high-dimensional, sparse and have a Zipfian distribution of both classes and features. On such data sets, current scalable clustering methods such as streaming clustering...
详细信息
ISBN:
(纸本)9783642030697
Many data sets derived from the web are large, high-dimensional, sparse and have a Zipfian distribution of both classes and features. On such data sets, current scalable clustering methods such as streaming clustering suffer from fragmentation. where large classes are incorrectly divided into many smaller clusters. and computational efficiency drops significantly. We present a new clustering algorithm based on connected components that addresses these issues and so works well oil web-type data.
We report results of stylistic differences in blogging for gentler and age group variation. The results are based on two mutually independent features. The first feature is the use of slang words which is a new concep...
详细信息
ISBN:
(纸本)9783642111631
We report results of stylistic differences in blogging for gentler and age group variation. The results are based on two mutually independent features. The first feature is the use of slang words which is a new concept proposed by us for Stylistic study of bloggers. For the second feature, we have analyzed the variation in average length of sentences across various age groups and gender. These features are augmented with previous study results reported in literature for stylistic analysis. The combined feature list enhances the accuracy by a remarkable extent in predicting age and gender. These machinelearning experiments were done on two separate demographically tagged blog corpus. Gentler determination is more accurate than age group detection over the dataspread across all ages but the accuracy of age prediction increases if we sample data with remarkable age difference.
In this paper. we propose discretization-based schemes to preserve privacy in time series datamining. Traditional research oil preserving privacy in datamining focuses oil time-invariant privacy issues. With the eme...
详细信息
ISBN:
(纸本)9783642030697
In this paper. we propose discretization-based schemes to preserve privacy in time series datamining. Traditional research oil preserving privacy in datamining focuses oil time-invariant privacy issues. With the emergence of time series datamining, traditional snapshot-based privacy issues need to be extended to be multi-dimensional with the addition of time dimension. In this paper, we defined three threat models based oil trust relationship between the data miner and data providers. We propose three different schemes for these three threat models. The proposed schemes are extensively evaluated against public-available time series data sets [1]. Our experiments show that proposed schemes can preserve privacy with cost of reduction ill accuracy. For most data sets, proposed schemes call achieve low privacy leakage with slight reduction in classification accuracy. We also Studied effect of parameters of proposed schemes in this paper.
In this paper, we propose a novel fast training algorithm called Constructive Semi-Supervised Classification Algorithm (CS-SCA) for neural network construction based on the concept of geometrical expansion. Parameters...
详细信息
ISBN:
(纸本)9783642111631
In this paper, we propose a novel fast training algorithm called Constructive Semi-Supervised Classification Algorithm (CS-SCA) for neural network construction based on the concept of geometrical expansion. Parameters are updated according to the geometrical location of the training samples in the input space, and each sample in the training set is learned only once. It's a semi-supervised based approach, the training samples are semi-labeled i.e. for some samples, labels are known and for some samples, data labels are not known. The method starts with clustering, which is done by using the concept of geometrical expansion. In clustering process various clusters are formed. The clusters are visualizes in terms of hyperspheres. Once clustering process over labeling of hyperspheres is done, in which class is assigned to each hypersphere for classifying the multi-dimensional data. This constructive learning avoids blind selection of neural network structure. The method proposes here is exhaustively tested with different benchmark datasets and it is found that, on increasing value of training parameters number of hidden neurons and training time both are getting decrease. Through our experimental work we conclude that CS-SCA result in simple neural network structure by less training time.
This paper addresses the problem of automatically learning the title metadata from HTML documents. The objective, is to help indexing Web resources that are poorly annotated. Other works proposed similar objectives, b...
详细信息
ISBN:
(纸本)9783642030697
This paper addresses the problem of automatically learning the title metadata from HTML documents. The objective, is to help indexing Web resources that are poorly annotated. Other works proposed similar objectives, but they considered only titles in text format . In this paper we propose a general learning schema that allows learning textual titles based on style information and image format titles based on image properties. We construct features from automatically annotated pages harvested from the Web;this paper details the corpus creation method as well as the information extraction techniques. Based oil these features. learning algorithms, such as Decision Trees and Random Forest algorithms are applied achieving good results despite the heterogeneity of our corpus, we also show that, combining both methods can induce better performance.
暂无评论