Artificial Neural Networks (ANNs), as a nonlinear and adaptive information processing systems, play an important role in machine learning, artificial intelligence, and data mining. But the performance of ANNs is sensi...
详细信息
Artificial Neural Networks (ANNs), as a nonlinear and adaptive information processing systems, play an important role in machine learning, artificial intelligence, and data mining. But the performance of ANNs is sensitive to the number of neurons, and chieving a better network performance and simplifying the network topology are two competing objectives. While Genetic Algorithms (GAs) is a kind of random search algorithm which simulates the nature selection and evolution, which has the advantages of good global search abilities and learning the approximate optimal solution without the gradient information of the error functions. This paper makes a brief survey on ANNs optimization with GAs. Firstly, the basic principles of ANNs and GAs are introduced, by analyzing the advantages and disadvantages of GAs and ANNs, the superiority of using GAs to optimize ANNs is expressed. Secondly, we make a brief survey on the basic theories and algorithms of optimizing the network weights, optimizing the network architecture and optimizing the learning rules, and make a discussion on the latest research progresses. At last, we make a prospect on the development trend of the theory.
Back-Propagation (BP) neural network, as one of the most mature and most widespread algorithms, has the ability of large scale computing and has unique advantages when dealing with nonlinear high dimensional data. But...
详细信息
Back-Propagation (BP) neural network, as one of the most mature and most widespread algorithms, has the ability of large scale computing and has unique advantages when dealing with nonlinear high dimensional data. But when we manipulate high dimensional data with BP neural network, many feature variables provide enough information, but too many network inputs go against designing of the hidden-layer of the network and take up plenty of storage space as well as computing time, and in the process interfere the convergence of the training network, even influence the the accuracy of recognition finally. Factor analysis (FA) is a multivariate analysis method which transforms many feature variables into few synthetic variables. Aiming at the characteristics that the samples processed have more feature variables, combining with the structure feature of BP neural network, a FA-BP neural network algorithm is proposed. Firstly we reduce the dimensionality of the feature factor using FA, and then regard the features reduced as the input of the BP neural network, carry on network training and simulation with low dimensional data that we get. This algorithm here can simplify the network structure, improve the velocity of convergence, and save the running time. Then we apply the new algorithm in the field of pest prediction to emulate. The results show that under the prediction precision is not reduced, the error of the prediction value is reduced by using the new algorithm, and therefore the algorithm is effective.
By use of the properties of ant colony algorithm and genetic algorithm, a novel ant colony genetic hybrid algorithm, whose framework of hybrid algorithm is genetic algorithm, is proposed to solve the traveling salesma...
详细信息
Cloud provides dynamically computing services for large scales of data over the Internet. IaaS(information as a service) is one of utilities to provide information service in Cloud computing. Large scales of XML data ...
详细信息
Cloud provides dynamically computing services for large scales of data over the Internet. IaaS(information as a service) is one of utilities to provide information service in Cloud computing. Large scales of XML data are produced continually in Internet. Efficient information filtering services are needed. Previous XML filter approaches aim at XPath queries. However, many users tend to use keywords to describe requirements. SLCA (Smallest Lowest Common Ancestor)-based XML keyword search is one of the most important information retrieval approaches. Former approaches focus on building centralized index for a large scale of XML document collection and can't process continuous XML streams. This paper addresses SLCA computing service for continuous XML document. A novel SLCA computing service is designed, where SLCAs are obtained in one scan of XML stream. We demonstrate the efficiency of our algorithms analytically and experimentally.
In our real world, there usually exist several different objects in one image, which brings intractable challenges to the traditional pattern recognition methods to classify the images. In this paper, we introduce a C...
详细信息
In our real world, there usually exist several different objects in one image, which brings intractable challenges to the traditional pattern recognition methods to classify the images. In this paper, we introduce a Conditional Random Fields (CRFs) model to deal with the Multi-label Image Classification problem. Considering the correlations of the objects, a second-order CRFs is constructed to capture the semantic associations between labels. Different initial feature weights are set to introduce the voting techniques for a better performance. We evaluate our methods on MSRC dataset and demonstrate high precision, recall and F 1 measure, showing that our method is competitive.
The enlarging volumes of data resources produced in real world makes classification of very large scale data a challenging task. Therefore, parallel process of very large high dimensional data is very important. Hyper...
The enlarging volumes of data resources produced in real world makes classification of very large scale data a challenging task. Therefore, parallel process of very large high dimensional data is very important. Hyper-Surface Classification (HSC) is approved to be an effective and efficient classification algorithm to handle two and three dimensional data. Though HSC can be extended to deal with high dimensional data with dimension reduction or ensemble techniques, it is not trivial to tackle high dimensional data directly. Inspired by the decision tree idea, an improvement of HSC is proposed to deal with high dimensional data directly in this work. Furthermore, we parallelize the improved HSC algorithm (PHSC) to handle large scale high dimensional data based on MapReduce framework, which is a current and powerful parallel programming technique used in many fields. Experimental results show that the parallel improved HSC algorithm not only can directly deal with high dimensional data, but also can handle large scale data set. Furthermore, the evaluation criterions of scaleup, speedup and sizeup validate its efficiency.
With the rapid development of XML language which has good flexibility and interoperability, more and more log files of software running information are represented in XML format, especially for Web services. Fault dia...
With the rapid development of XML language which has good flexibility and interoperability, more and more log files of software running information are represented in XML format, especially for Web services. Fault diagnosis by analyzing semi-structured and XML like log files is becoming an important issue in this area. For most related learning methods, there is a basic assumption that training data should be in identical structure, which does not hold in many situations in practice. In order to learn from training data in different structures, we propose a similarity-based Bayesian learning approach for fault diagnosis in this paper. Our method is to first estimate similarity degrees of structural elements from different log files. Then the basic structure of combined Bayesian network (CBN) is constructed, and the similarity-based learning algorithm is used to compute probabilities in CBN. Finally, test log data can be classified into possible fault categories based on the generated CBN. Experimental results show our approach outperforms other learning approaches on those training datasets which have different structures.
Data indexing is common in data mining when working with high-dimensional, large-scale data sets. Hadoop, a cloud computing project using the MapReduce framework in Java, has become of significant interest in distribu...
详细信息
Data indexing is common in data mining when working with high-dimensional, large-scale data sets. Hadoop, a cloud computing project using the MapReduce framework in Java, has become of significant interest in distributed data mining. A feasible distributed data indexing algorithm is proposed for Hadoop data mining, based on ZSCORE binning and inverted indexing and on the Hadoop SequenceFile format. A data mining framework on Hadoop using the Java Persistence API (JPA) and MySQL Cluster is proposed. The framework is elaborated in the implementation of a decision tree algorithm on Hadoop. We compare the data index-ing algorithm with Hadoop MapFile indexing, which performs a binary search, in a modest cloud environment. The results show the algorithm is more efficient than naïve MapFile indexing. We compare the JDBC and JPA implementations of the data mining framework. The performance shows the framework is efficient for data mining on Hadoop.
Support Vector Machine (SVM) is a classification technique of machine learning based on statistical learning theory. A quadratic optimization problem needs to be solved in the algorithm, and with the increase of the s...
详细信息
Co-occurrence histograms of oriented gradients (CoHOG) are powerful descriptors in object detection. In this paper, we propose to utilize a very large pool of CoHOG features with variable-location and variable-size bl...
详细信息
Co-occurrence histograms of oriented gradients (CoHOG) are powerful descriptors in object detection. In this paper, we propose to utilize a very large pool of CoHOG features with variable-location and variable-size blocks to capture salient characteristics of the object structure. We consider a CoHOG feature as a block with a special pattern described by the offset. A boosting algorithm is further introduced to select the appropriate locations and offsets to construct an efficient and accurate cascade classifier. Experimental results on public datasets show that our approach simultaneously achieves high accuracy and fast speed on both pedestrian detection and car detection tasks.
暂无评论