Since the number of farmers has been decreasing recently, shortage of the labor force is a serious problem in many farmhouses. In order to solve this problem, it is necessary to realize the system to support farmer3...
详细信息
ISBN:
(纸本)9781479941735
Since the number of farmers has been decreasing recently, shortage of the labor force is a serious problem in many farmhouses. In order to solve this problem, it is necessary to realize the system to support farmer's works in low costs. The purpose of our research is to construct the system which can predict the farmland environment in the near future. In this research, we focus on the control of soil wetness and temperature. We formalize a model for expressing the rule for predicting temperature and soil wetness from the latest environmental data of farmhouse. We show that the rule can be generated by the machinelearning algorithm ID3. We research the confidence of each prediction by comparing data obtained from the experiment of cultivating farm products using a greenhouse. Based on the result, we research for finding environmental factors which are needed to create the hypothesis for the prediction of the environment transformation.
Increasing popularity of Social Media has resulted in the creation of a huge amount of user generated documents. A large number of research works have focused on inferring relationship in certain specific social netwo...
详细信息
ISBN:
(纸本)9783319089799;9783319089782
Increasing popularity of Social Media has resulted in the creation of a huge amount of user generated documents. A large number of research works have focused on inferring relationship in certain specific social network domains. Few have considered structured data to establish syntax based relationship. In this work, we develop a two-step syntax based and semantic based relationship mining approach. Here we generalize the concept of relationship mining for all structured as well as unstructured unsupervised text documents from all social network domains. At first, we choose suitable features from individual document and store them in graph structure. Then we establish relationships in the graph generated to obtain Reduced node Social Graph with Relationships (RSGR). Our empirical study on various social media document validates the effectiveness of our approach and suggests its generality in finding relationships irrespective of the type of text documents and the social network domains.
From the perspective of machinelearning and datamining applications, expressing data in rdF rather than a domain-specific format can add complexity and obfuscate the internal structure. We investigate and illustrate...
详细信息
From the perspective of machinelearning and datamining applications, expressing data in rdF rather than a domain-specific format can add complexity and obfuscate the internal structure. We investigate and illustrate this issue with an example where bio-molecular graph datasets are expressed in rdF. We use this example to inspire preprocessing techniques which reverse some of the complications of adding semantic annotations, exposing those patterns in the data that are most relevant to machinelearning. We test these methods in a number of classification experiments and show that they can improve performance both for our example datasets and real-world rdF datasets.
The proceedings contain 25 papers. The topics discussed include: cloud and mobile security: challenges and future research directions;DLP-technologies: new directions and trends;using fuzzy logic to evaluate trust in ...
The proceedings contain 25 papers. The topics discussed include: cloud and mobile security: challenges and future research directions;DLP-technologies: new directions and trends;using fuzzy logic to evaluate trust in e-commerce;gamification of teaching and learning activity: prospect and challenges of mobile game-based learning;ComboSplit: combining various splitting criteria for building a single decision tree;text classification using computational model of the cerebral cortex;restricted Boltzmann machines for modeling businesses;variables selection for multiclass SVM using the multiclass radius margin bound;on the enumeration of frequent patterns in sequences;predicting movie incomes using search engine query data;best-parameterized sigmoid ELM for benign and malignant breast cancer detection;inference engine for classification of expert systems using keyword extraction technique;comparison of classifiers for retinal pathology images using surf and bag-of-words model;content based video quality control for wide-area video surveillance systems;line detection by centre and width estimation;and interactive versus passive 2D face spoofing detection.
In one-class classification, the low variance directions in the training data carry crucial information to build a good model of the target class. Boundary-based methods like One-Class Support Vector machine (OSVM) pr...
详细信息
In one-class classification, the low variance directions in the training data carry crucial information to build a good model of the target class. Boundary-based methods like One-Class Support Vector machine (OSVM) preferentially separates the data from outliers along the large variance directions. On the other hand, retaining only the low variance directions can result in sacrificing some initial properties of the original data and is not desirable, specially in case of limited training samples. This paper introduces a Covariance-guided One-Class Support Vector machine (COSVM) classification method which emphasizes the low variance projectional directions of the training data without compromising any important characteristics. COSVM improves upon the OSVM method by controlling the direction of the separating hyperplane through incorporation of the estimated covariance matrix from the training data. Our proposed method is a convex optimization problem resulting in one global optimum solution which can be solved efficiently with the help of existing numerical methods. The method also keeps the principal structure of the OSVM method intact, and can be implemented easily with the existing OSVM libraries. Comparative experimental results with contemporary one-class classifiers on numerous artificial and benchmark datasets demonstrate that our method results in significantly better classification performance. (C) 2014 Elsevier Ltd. All rights reserved.
Flux domain is one of the most active threat vectors and its behavior keeps changing to evade existing detection measures. In order to differentiate the malicious flux domains from legitimate ones such as content deli...
详细信息
ISBN:
(纸本)9783319089799;9783319089782
Flux domain is one of the most active threat vectors and its behavior keeps changing to evade existing detection measures. In order to differentiate the malicious flux domains from legitimate ones such as content delivery network (CDN) and network time protocol (NTP) services that have similar behavior, a novel time series model is created with a set of features that are not only focused on domain name system (DNS) time-to-live (TTL) but on loyalty and entropy of DNS resource records. An offline system is built with big data technology for training the model in a semi-supervised mode. In addition, an online platform is designed and developed to support large throughput real-time DNS streaming data processing with advanced analytics technologies. The feature extraction, classification, accuracy and performance are discussed based on large amount of real world DNS data in this paper.
Concept drift can be considered as a distribution mismatch problem where class distribution changes as a time passes. This problem is commonly found in classification task of datamining. Among the proposed solutions,...
详细信息
ISBN:
(纸本)9783319089799;9783319089782
Concept drift can be considered as a distribution mismatch problem where class distribution changes as a time passes. This problem is commonly found in classification task of datamining. Among the proposed solutions, the cost-based Class Distribution Estimation (CDE) shows the best performance in coping with difference in class distribution between train and test datasets. However there is still some problem, as CDE lost its performance when there is too much change in class distribution. In this paper, CDE-weight is proposed to reduce the impact of high change in class distribution. The idea is to use many models suitable with many class distributions along with dynamic weighting method that adjusts weight of each model according to its class distribution. Experimented results indicate that CDE-Weight methods are able to reduce the impact of misestimating and improve the classifier performance when train and test data are different.
datamining can be used to make modeling for individual learner39;s usage record, combining with learner39;s basic information to make analysis of his habits, personal preferences to provide personalized service f...
详细信息
Classification of data points in a data stream is a fundamentally different set of challenges than datamining on static data. While streaming data is often placed into the context of "Big data"(or more spec...
详细信息
ISBN:
(纸本)9781479942749
Classification of data points in a data stream is a fundamentally different set of challenges than datamining on static data. While streaming data is often placed into the context of "Big data"(or more specifically "Fast data") wherein one-pass algorithms are used, true data streams offer additional hurdles due to their dynamic, evolving, and non-stationary nature. During the stream, the available labels (or concepts) often change, and a concept's definition in the feature space can also evolve (or drift) over time. The core issue is that the hidden generative function of the data is not a constant function, but rather evolves over time. This is known as a non-stationary distribution. In this paper, we describe a new approach to using ensembles for stream classification. While the core method is straightforward, it is specifically designed to adapt quickly with very little overhead to the dynamic and evolving nature of data streams generated from non-stationary functions. Our method, M-3, is based on a weighted majority ensemble of heterogeneous model types where model weights are updated on-line using Reinforcement learning techniques. We compare our method with current leading algorithms as implemented in the Massive Online Analysis (MOA) framework using UCI benchmark and synthetic stream generator data sets, and find that our method shows particularly strong gain over the baseline method when ground truth is of limited availability to the classifiers.
In this paper, we propose and implement the datamining techniques for verification of hand-writing recorded in an image. The captured images are considered independent of writing material in this system. This system ...
详细信息
ISBN:
(纸本)9781479930807
In this paper, we propose and implement the datamining techniques for verification of hand-writing recorded in an image. The captured images are considered independent of writing material in this system. This system consists of six sub-modules. Namely, i) Sample image data acquisition and preprocessing;ii) Vectors generation;iii) Computation of clusters;iv) Cluster Head Computation v) pattern Parameter Extraction;vi) Result. The first sub-module captures and categorizes the image for preprocessing. These preprocessed images are vectored and a cluster is computed based on thea) degree of entropy in the vectors. Therefore, these bunch of clusters represent themselves with the degree of entropy, type of cluster by choosing a cluster head. Finally, the parameters such as the distance, entropy, confidence, are extracted from the clustering;and a result is generated for the given set of samples.
暂无评论