In order to develop an automatic and rapid detection method for enumeration of total bacteria in juice, biomimetic patternrecognition and machine vision were employed. The characteristic data, such as shape, texture ...
详细信息
Sensors are being deployed to improve border security generating enormous collections of data and databases. Unfortunately these sensors can respond to a variety of stimuli, sometimes reacting to meaningful events and...
详细信息
Risk Management is a logical and systematic method of identifying, analyzing, treating and monitoring the risks involved in any activity or process. The key to successful risk management lies in the ability to tailor ...
详细信息
The fuzzy neural network technology is one of the hot topics of datamining. According to the Max Similarity Rule, this paper sets forth the cross entropy theory with formulae deduction in detail and a new activation ...
详细信息
The proceedings contain 86 papers. The topics discussed include: iris features extraction using dual-tree complex wavelet transform;fuzzy methods for forensic data analysis;a new weighted rough set framework for imbal...
ISBN:
(纸本)9781424478958
The proceedings contain 86 papers. The topics discussed include: iris features extraction using dual-tree complex wavelet transform;fuzzy methods for forensic data analysis;a new weighted rough set framework for imbalance class distribution;multi stereo camera data fusion for fingertip detection in gesture recognition systems;recognition of signed expressions using visually-oriented subunits obtained by an immune-based optimization;3-D object recognition based on SVM and stereo-vision: application in endoscopic imaging;inter-camera color calibration for object re-identification and tracking;improving the accuracy of intrusion detection systems by using the combination of machinelearning approaches;mining web videos for video quality assessment;ultra fast fingerprint indexing for embedded system;damageless image hashing using neural network;and classification by means of fuzzy analogy-related proportions - a preliminary report.
The overwhelming amount of data that is available nowadays makes many of the existing machine laming algorithms inapplicable to many real-world problems Two approaches have been used to deal with this problem scaling ...
详细信息
ISBN:
(纸本)9783642130243
The overwhelming amount of data that is available nowadays makes many of the existing machine laming algorithms inapplicable to many real-world problems Two approaches have been used to deal with this problem scaling up datamining algorithms [1] and data reduction Nevertheless. scaling up a certain algorithm is not always feasible. One of the most common methods for data reduction is feature selection. but when we face large problems, the scalability becomes an issue This paper presents a way of removing this difficulty using several rounds of feature selection on subsets of the original dataset, combined using a voting scheme The performance is very good in terms of testing error and storage reduction, while the execution time of the process is decreased very significantly The method is especially efficient when we use feature selection algorithms that are of a high computational cost An extensive comparison in 27 datasets of medium and large sizes front the UCI machinelearning Repository and using different classifiers shows the usefulness of our method.
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we present a set of techniques to mine rules f...
详细信息
ISBN:
(纸本)9781605588896
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we present a set of techniques to mine rules from URLs and utilize these rules for de-duplication using just URL strings without fetching the content explicitly. Our technique is composed of mining the crawl logs and utilizing clusters of similar pages to extract transformation rules, which are used to normalize URLs belonging to each cluster. Preserving each mined rule for de-duplication is not efficient due to the large number of such rules. We present a machinelearning technique to generalize the set of rules, which reduces the resource footprint to be usable at web-scale. The rule extraction techniques are robust against web-site specific URL conventions. We compare the precision and scalability of our approach with recent efforts in using URLs for de-duplication. Experimental results demonstrate that our approach achieves 2 times more reduction in duplicates with only half the rules compared to the most recent previous approach. Scalability of the framework is demonstrated by performing a large scale evaluation on a set of 3 Billion URLs, implemented using the MapReduce framework. Copyright 2010 ACM.
Tagging plays an important role in many recent websites. Recommender systems can help to suggest a user the tags he might want to use for tagging a specific item. Factorization models based on the Tucker Decomposition...
详细信息
ISBN:
(纸本)9781605588896
Tagging plays an important role in many recent websites. Recommender systems can help to suggest a user the tags he might want to use for tagging a specific item. Factorization models based on the Tucker Decomposition (TD) model have been shown to provide high quality tag recommendations outperforming other approaches like PageRank, FolkRank, collaborative filtering, etc. The problem with TD models is the cubic core tensor resulting in a cubic runtime in the factorization dimension for prediction and learning. In this paper, we present the factorization model PITF (Pairwise Interaction Tensor Factorization) which is a special case of the TD model with linear runtime both for learning and prediction. PITF explicitly models the pairwise interactions between users, items and tags. The model is learned with an adaption of the Bayesian personalized ranking (BPR) criterion which originally has been introduced for item recommendation. Empirically, we show on real world datasets that this model outperforms TD largely in runtime and even can achieve better prediction quality. Besides our lab experiments, PITF has also won the ECML/PKDD Discovery Challenge 2009 for graph-based tag recommendation. Copyright 2010 ACM.
We consider the problem of semi-supervised learning to extract categories (e.g., academic fields, athletes) and relations (e.g., PlaysSport(athlete, sport)) from web pages, starting with a handful of labeled training ...
详细信息
ISBN:
(纸本)9781605588896
We consider the problem of semi-supervised learning to extract categories (e.g., academic fields, athletes) and relations (e.g., PlaysSport(athlete, sport)) from web pages, starting with a handful of labeled training examples of each category or relation, plus hundreds of millions of unlabeled web documents. Semi-supervised training using only a few labeled examples is typically unreliable because the learning task is underconstrained. This paper pursues the thesis that much greater accuracy can be achieved by further constraining the learning task, by coupling the semi-supervised training of many extractors for different categories and relations. We characterize several ways in which the training of category and relation extractors can be coupled, and present experimental results demonstrating significantly improved accuracy as a result. Copyright 2010 ACM.
In clustering methods, the estimation of the optimal number of clusters is significant for subsequent analysis. As a simple clustering method, the fuzzy c-means algorithm (FCM) has been widely discussed and applied in...
详细信息
暂无评论