作者:
S.G. DjorgovskiDivision of Physics
Mathematics and Astronomy and with Center for Advanced Computing Research California Institute of Technology Pasadena CA USA
All sciences, including astronomy, are now entering the era of information abundance. The exponentially increasing volume and complexity of modern data sets promises to transform the scientific practice, but also pose...
详细信息
All sciences, including astronomy, are now entering the era of information abundance. The exponentially increasing volume and complexity of modern data sets promises to transform the scientific practice, but also poses a number of common technological challenges. The virtual observatory concept is the astronomical community's response to these challenges: it aims to harness the progress in information technology in the service of astronomy, and at the same time provide a valuable testbed for information technology and applied computer science. Challenges broadly fall into two categories: data handling (or "data farming"), including issues such as archives, intelligent storage, databases, interoperability, fast networks, etc., and datamining, data understanding, and knowledge discovery, which include issues such as automated clustering and classification, multivariate correlation searches, patternrecognition, visualization in highly hyperdimensional parameter spaces, etc., as well as various applications of machinelearning in these contexts. Such techniques are forming a methodological foundation for science with massive and complex data sets in general, and are likely to have a much broader impact on the modern society, commerce, information economy, security, etc. There is a powerful emerging synergy between the computationally enabled science and the science-driven computing, which will drive the progress in science, scholarship, and many other venues in the 21st century.
Feature selection plays a central role in data analysis and is also a crucial step in machinelearning, datamining and patternrecognition. Feature selection algorithm focuses mainly on the design of a criterion func...
详细信息
ISBN:
(纸本)3540225552
Feature selection plays a central role in data analysis and is also a crucial step in machinelearning, datamining and patternrecognition. Feature selection algorithm focuses mainly on the design of a criterion function and the selection of a search strategy. In this paper, a novel feature selection approach (NFSA) based on quantum genetic algorithm (QGA) and a good evaluation criterion is proposed to select the optimal feature subset from a large number of features extracted from radar emitter signals (RESs). The criterion function is given firstly. Then, detailed algorithm of QGA is described and its performances are analyzed. Finally, the best feature subset is selected from the original feature set (OFS) composed of 16 features of RESs. Experimental results show that the proposed approach reduces greatly the dimensions of OFS and heightens accurate recognition rate of RESs, which indicates that NFSA is feasible and effective.
The proceedings contain 36 papers. The special focus in this conference is on Biosystems for IT Evolution, Bio-inspired Software Systems, Hardware Systems and Robotics. The topics include: Object-oriented specificatio...
ISBN:
(纸本)3540233393
The proceedings contain 36 papers. The special focus in this conference is on Biosystems for IT Evolution, Bio-inspired Software Systems, Hardware Systems and Robotics. The topics include: Object-oriented specification of complex bio-computing processes;analysis of responses of complex bionetworks to changes in environmental conditions;experimental molecular evolution showing flexibility of fitness leading to coexistence and diversification in biological system;learning Bayesian networks by Lamarckian genetic algorithm and its application to yeast cell-cycle gene network reconstruction from time-series microarray data;biologically inspired reinforcement learning;biologically plausible speech recognition with LstM neural nets;spatial tangible user interfaces for cognitive assessment and training;biologically inspired computer virus detection system;explaining low-level brightness-contrast illusions using disinhibition;autonomous acquisition of the meaning of sensory states through sensory-invariance driven action;embryonic machines that divide and differentiate;a hardware implementation of a network of functional spiking neurons with hebbian learning;a study on designing robot controllers by using reinforcement learning with evolutionary state recruitment strategy;movement generation and control with generic neural microcircuits;anatomy and physiology of an artificial vision matrix;an adaptive mechanism for epidemic communication;distributed central pattern generator model for robotics application based on phase sensitivity analysis;ant-based approach to mobile agent traversal;media streaming on p2p networks with bio-inspired cache replacement algorithm and scalable and robust scheme for data gathering in sensor networks.
The proceedings contain 40 papers. The special focus in this conference is on Artificial Intelligence Applications and Innovations. The topics include: Artificial intelligence systems in micromechanics;integrating two...
ISBN:
(纸本)1402081502
The proceedings contain 40 papers. The special focus in this conference is on Artificial Intelligence Applications and Innovations. The topics include: Artificial intelligence systems in micromechanics;integrating two artificial intelligence theories in a medical diagnosis application;artificial intelligence and law;virtual market environment for trade;an artificial neural networks approach to the estimation of physical stellar parameters;evolutionary robot behaviors based on natural selection and neural network;control of overhead crane by fuzzy-PID with genetic optimisation;creative design of fuzzy logic controller;on-line extraction of fuzzy rules in a wastewater treatment plant;finding manufacturing expertise using ontologies and cooperative agents;using agents in the exchange of product data;a pervasive identification and adaptation system for the smart house;deductive diagnosis of digital circuits;verification of nasa emergent systems;learning Bayesian metanetworks from data with multilevel uncertainty;using organisational structures emergence for maintaining functional integrity in embedded systems networks;efficient attribute reduction algorithm;using relative logic for patternrecognition;a multi-agent intelligent tutoring system;analysis and intelligent support of learning communities in semi-structured discussion environments;an adaptive assessment system to evaluate student ability level;forming the optimal team of experts for collaborative work;introducing a star topology into latent class models for collaborative filtering;an agency for semantic-based automatic discovery of web services;using genetic algorithms and tabu search parallel models to solve the scheduling problem;modelling document categories by evolutionary learning of text centroids;AIR - a platform for intelligent systems;datamining by MOUCLAS algorithm for petroleum reservoir characterization from well logging data;online possibilistic diagnosis based on expert knowledge for engine dyno
Complex distributed Internet services form the basis not only of e-commerce but increasingly of mission-critical network-based applications. What is new is that the workload and internal architecture of three-tier ent...
详细信息
ISBN:
(纸本)1581139896
Complex distributed Internet services form the basis not only of e-commerce but increasingly of mission-critical network-based applications. What is new is that the workload and internal architecture of three-tier enterprise applications presents the opportunity for a new approach to keeping them running in the face of many common recoverable failures. The core of the approach is anomaly detection and localization based on statistical machinelearning techniques. Unlike previous approaches, we propose anomaly detection and patternmining not only for operational statistics such as mean response time, but also for structural behaviors of the system-what parts of the system, in what combinations, are being exercised in response to different kinds of external stimuli. In addition, rather than building baseline models a priori, we extract them by observing the behavior of the system over a short period of time during normal operation. We explain the necessary underlying assumptions and why they can be realized by systems research, report on some early successes using the approach, describe benefits of the approach that make it competitive as a path toward self-managing systems, and outline some research challenges. Our hope is that this approach will enable "new science" in the design of self-managing systems by allowing the rapid and widespread application of statistical learning theory techniques (SLT) to problems of system dependability.
A fast support vector machine (SVM) training algorithm is proposed under SVM's decomposition framework by effectively integrating kernel caching, digest and shrinking policies and stopping conditions. Kernel cachi...
详细信息
A fast support vector machine (SVM) training algorithm is proposed under SVM's decomposition framework by effectively integrating kernel caching, digest and shrinking policies and stopping conditions. Kernel caching plays a key role in reducing the number of kernel evaluations by maximal reusage of cached kernel elements. Extensive experiments have been conducted on a large handwritten digit database MNIst to show that the proposed algorithm is much faster than Keerthi et al.'s improved SMO, about nine times. Combined with principal component analysis, the total training for ten one-against-the-rest classifiers on MNIst took less than an hour. Moreover, the proposed fast algorithm speeds up SVM training without sacrificing the generalization performance. The 0.6% error rate on MNIst test set has been achieved. The promising scalability of the proposed scheme paves a new way to solve more large-scale learning problems in other domains such as datamining.
A challenge for statistical learning is to deal with large data sets, e.g. in datamining. The training time of ordinary Support Vector machines is at least quadratic, which raises a serious research challenge if we w...
详细信息
A challenge for statistical learning is to deal with large data sets, e.g. in datamining. The training time of ordinary Support Vector machines is at least quadratic, which raises a serious research challenge if we want to deal with data sets of millions of examples. We propose a "hard parallelizable mixture" methodology which yields significantly reduced training time through modularization and paxallelization: the training data is iteratively partitioned by a "gater" model in such a way that it becomes easy to learn an "expert" model separately in each region of the partition. A probabilistic extension and the use of a set of generative models allows representing the gater so that all pieces of the model are locally trained. For SVMs, time complexity appears empirically to local growth linearly with the number of examples, while generalization performance can be enhanced. For the probabilistic version of the algorithm, the iterative algorithm probably goes down in a cost function that is an upper bound on the negative log-likelihood.
A fast support vector machine (SVM) training algorithm is proposed under SVM's decomposition framework by effectively integrating kernel caching, digest and shrinking policies and stopping conditions. Kernel cachi...
详细信息
A fast support vector machine (SVM) training algorithm is proposed under SVM's decomposition framework by effectively integrating kernel caching, digest and shrinking policies and stopping conditions. Kernel caching plays a key role in reducing the number of kernel evaluations by maximal reusage of cached kernel elements. Extensive experiments have been conducted on a large handwritten digit database MNIst to show that the proposed algorithm is much faster than Keerthi et al.'s improved SMO, about nine times. Combined with principal component analysis, the total training for ten one-against-the-rest classifiers on MNIst took less than an hour. Moreover, the proposed fast algorithm speeds up SVM training without sacrificing the generalization performance. The 0.6% error rate on MNIst test set has been achieved. The promising scalability of the proposed scheme paves a new way to solve more large-scale learning problems in other domains such as datamining.
Training of support vector machines (SVMs) amounts to solving a quadratic programming problem over the training data. We present a simple on-line SVM training algorithm of complexity approximately linear in the number...
详细信息
Training of support vector machines (SVMs) amounts to solving a quadratic programming problem over the training data. We present a simple on-line SVM training algorithm of complexity approximately linear in the number of training vectors, and linear in the number of support vectors. The algorithm implements an on-line variant of sequential minimum optimization (SMO) that avoids the need for adjusting select pairs of training coefficients by adjusting the bias term along with the coefficient of the currently presented training vector. The coefficient assignment is a function of the margin returned by the SVM classifier prior to assignment, subject to inequality constraints. The training scheme lends efficiently to dedicated SVM hardware for real-time patternrecognition, implemented using resources already provided for run-time operation. Performance gains are illustrated using the Kerneltron, a massively parallel mixed-signal VLSI processor for kernel-based real-time video recognition.
A challenge for statistical learning is to deal with large data sets, e.g. in datamining. The training time of ordinary Support Vector machines is at least quadratic, which raises a serious research challenge if we w...
详细信息
ISBN:
(纸本)354044016X
A challenge for statistical learning is to deal with large data sets, e.g. in datamining. The training time of ordinary Support Vector machines is at least quadratic, which raises a serious research challenge if we want to deal with data sets of millions of examples. We propose a "hard parallelizable mixture" methodology which yields significantly reduced training time through modularization and paxallelization: the training data is iteratively partitioned by a "gater" model in such a way that it becomes easy to learn an "expert" model separately in each region of the partition. A probabilistic extension and the use of a set of generative models allows representing the gater so that all pieces of the model are locally trained. For SVMs, time complexity appears empirically to local growth linearly with the number of examples, while generalization performance can be enhanced. For the probabilistic version of the algorithm, the iterative algorithm probably goes down in a cost function that is an upper bound on the negative log-likelihood.
暂无评论