Industrial data mining projects in general and big data mining projects in particular suffer from long project execution. the resulting high costs render many interesting use cases otherwise economically unattractive....
详细信息
ISBN:
(纸本)9783030034931;9783030034924
Industrial data mining projects in general and big data mining projects in particular suffer from long project execution. the resulting high costs render many interesting use cases otherwise economically unattractive. this contribution shows on the example of anomaly detection for process plants, how the major obstacles - namely the inefficient development tools for Big data Frameworks like Apache Hadoop and Spark and the lack of reuse of software artifacts across different projects can be overcome. this is achieved by selecting an application case that shares considerable commonalities across different projects and providing a supported project workflow implemented in a scalable and extensible big data architecture.
Video anomaly detection is one of the most attractive problem in various fields likes computer vision. In this paper, we propose a VAD classifier modeling method that learns in a supervised learning manner. the basic ...
详细信息
ISBN:
(纸本)9783030034931;9783030034924
Video anomaly detection is one of the most attractive problem in various fields likes computer vision. In this paper, we propose a VAD classifier modeling method that learns in a supervised learning manner. the basic idea is to solve the problem of labeled data shortage through transfer learning. the key idea is to create an underlying model of transfer learningthrough the GAN of discriminator. We solved this problem by proposing a GAN model consisting of a generator that generates video sequences and a discriminator that follows LRCN structure. As a result of the experiment, the VAD classifier learned through GAN-based transfer learning obtained higher accuracy and recall than the pure LRCN classifier and other machine learning methods. Additionally, we demonstrated that the generator be able to stably generate the image similar to the actual data as the learning progressed. To the best of our knowledge, this paper is the first case to solve the VAD problem using the GAN model and the supervised learning manner.
In this paper we propose a simple unsupervised approach to learning higher order features. this model is based on the recent success of lightweight approaches such as SOMNet and PCANet to the challenging task of image...
详细信息
ISBN:
(纸本)9783030034931;9783030034924
In this paper we propose a simple unsupervised approach to learning higher order features. this model is based on the recent success of lightweight approaches such as SOMNet and PCANet to the challenging task of image classification. Contrary to the more complex deep learning models such as convolutional neural networks (CNNs), these methods use naive algorithms to model the input distribution. Our endeavour focuses on the self-organizing map (SOM) based method and extends it by incorporating a competitive connection layer between filter learning stages. this simple addition encourages the second filter learning stage to learn complex combinations of first layer filters and simultaneously decreases channel depth. this approach to learning complex representations offers a competitive alternative to common deep learning models whilst maintaining an efficient framework. We test our proposed approach on the popular MNIST and challenging CIFAR-10 datasets.
In this paper, we consider the spatial-temporal data modeling problem with large number of time instants and moderate number of locations. the problem is formulated as a function estimation problem and then handled by...
详细信息
In this paper, we consider the spatial-temporal data modeling problem with large number of time instants and moderate number of locations. the problem is formulated as a function estimation problem and then handled by the Gaussian process regression method. To lower the computational complexity, we first sample the continuous-time kernel to get a discrete-time kernel, and then we derive its discrete-time state-space model realization, which is free of numerical problems. then we convert the Gaussian process regression problem to a Kalman filtering and smoothing problem including boththe hyper-parameter estimation and prediction. We consider three hyper-parameter estimation methods: the marginal likelihood maximization method, the generalized cross validation method, and the Stein's unbiased risk estimation method. the proposed implementation is tested over a simulated data set and the Colorado weather data set.
It has been long suggested that commit messages can greatly facilitate code comprehension. However, developers may not write good commit messages in practice. Neural machine translation (NMT) has been suggested to aut...
详细信息
It has been long suggested that commit messages can greatly facilitate code comprehension. However, developers may not write good commit messages in practice. Neural machine translation (NMT) has been suggested to automatically generate commit messages. Despite the efforts in improving NMT algorithms, the quality of the generated commit messages is not yet satisfactory. this paper, instead of improving NMT algorithms, suggests that proper preprocessing of code changes into concise inputs is quite critical to train NMT. We approach it with semantic analysis of code changes. We collect a real-world dataset with 50k+ commits of popular Java projects, and verify our idea with comprehensive experiments. the results show that preprocessing inputs with code semantic analysis can improve NMT significantly. this work sheds light to how to apply existing DNNs designed by the machine learning community, e.g., NMT models, to complete software engineering tasks.
the difficulty of the many classification tasks lies in the analyzed data nature, as disproportionate number of examples from different class in a learning set. Ignoring this characteristics causes that canonical clas...
详细信息
ISBN:
(纸本)9783030034962;9783030034955
the difficulty of the many classification tasks lies in the analyzed data nature, as disproportionate number of examples from different class in a learning set. Ignoring this characteristics causes that canonical classifiers display strongly biased performance on imbalanced datasets. In this work a novel classifier ensemble forming technique for imbalanced datasets is presented. On the one hand it takes into consideration selected features used for training individual classifiers, on the other hand it ensures an appropriate diversity of a classifier ensemble. the proposed method was tested on the basis of the computer experiments carried out on the several benchmark datasets. their results seem to confirm the usefulness of the proposed concept.
the companies involved in all areas of the business and industry can due to the unfavourable financial situation or inappropriate investments face financial problems resulting in bankruptcy of the company. the ability...
详细信息
ISBN:
(纸本)9783030034931;9783030034924
the companies involved in all areas of the business and industry can due to the unfavourable financial situation or inappropriate investments face financial problems resulting in bankruptcy of the company. the ability to foresee imminent bankruptcy helps managers and stock holders to take the corrective actions. In this paper, we analyze annual reports of thousands of limited liability companies and propose the bankruptcy prediction model. the available dataset is strongly imbalanced that corresponds to the real-world situation where bankrupt companies constitute only a small fraction of all companies. the proposed model is based on single-class least-squares anomaly detection classifier achieving as high as 91% prediction accuracy.
Falls in seniors can lead to serious physical and psychological consequences. A fall detector can allow a fallen person to receive medical intervention promptly after the incident. the accelerometer data from smartpho...
详细信息
this paper presents a learning Feedback Linearization (LFL) based Nonlinear Auto-Regressive Moving Average (NARMA) controller design for a ROTary inverted PENdulum (ROTPEN) plant. the proposed NARMA controller compris...
详细信息
this paper presents a learning Feedback Linearization (LFL) based Nonlinear Auto-Regressive Moving Average (NARMA) controller design for a ROTary inverted PENdulum (ROTPEN) plant. the proposed NARMA controller comprises of a linear controller and an LFL block. the LFL block concatenated withthe nonlinear plant constitutes a linear closed loop system so that linear control is applicable. An online learning algorithm is used for the data-dependent identification of the linearized plant and then for the data-dependent design of the linear part of the NARMA controller. the identification of the linearized plant starts withthe determination of the LFL block in a supervised way by exploiting the input and the corresponding state data obtained from the nonlinear plant. the linearized plant is then identified as an ARMA model by the data generated withthe combination of the already learned LFL block and the nonlinear plant. Robustness of the linearized system model is obtained by employing the ε-insensitive loss function ℓ 1, ε (.,.) as the identification error of the linearized system. the Schur stability of the overall closed loop system is ensured by the linear inequality constraints imposed in the minimization of the ℓ 1, ε (,.) tracking error for determining the linear controller parameters. the proposed LFL based NARMA controller is tested on ROTPEN model and its performance is compared withthe Proportional-Derivative controller and Hammerstein based NARMA adaptive controller.
Our research concentrates on ways to combine machine learning techniques for authorship attribution. Traditionally, research in authorship attribution is focused on the development of new base-classifiers (combination...
详细信息
ISBN:
(纸本)9783030034931;9783030034924
Our research concentrates on ways to combine machine learning techniques for authorship attribution. Traditionally, research in authorship attribution is focused on the development of new base-classifiers (combinations of stylometric features and learning methods). A large number of base-classifiers developed for authorship attribution vary in accuracy, often proposing different authors for a disputed document. In this research, we use predictions of multiple base-classifiers as a knowledge base for learningthe true author. We introduce and compare two novel methods that utilize multiple base-classifiers. In the Weighted Voting approach, each base-classifier supports an author in proportion to its accuracy in leave-one-out classification. In our Meta-learning approach, each base-classifier is treated as a feature and methods' predictions in leave-one-out cross-validation are used as training data from which machine learning methods produce an aggregated decision. We illustrate our results through a collection of 18th century political writings. Anonymously written essays were common during this period, leading to frequent disagreements between scholars over their attribution.
暂无评论