The sequence pattern mining method aims to identify frequent sequences that exceed a user-specified support threshold. The present study uses the same approach based on sequential standards to estimate the heat stress...
详细信息
The sequence pattern mining method aims to identify frequent sequences that exceed a user-specified support threshold. The present study uses the same approach based on sequential standards to estimate the heat stress of broilers from a resulting behavioural pattern. Experimental data were recorded in a climate chamber where the behaviour of broilers was recorded under thermoneutral (comfort) conditions, set as standard, and when exposed to thermal stress (cold and heat). The Generalised Sequential Patterns (GSP) algorithm was used to evaluate the heat stress of broilers in the third and fourth week of growth. The results indicated that the mining of pattern sequences is a useful and straightforward technique to estimate the welfare of broiler chickens, allowing the identification of temporal relations between thermal stress and the consequent behaviour of the broiler. Temperature 8 degrees C below the standard thermoneutral conditions showed that the broiler remained lying down most of the time, walking only to the drinker and feeder trough. Broilers exposed to temperatures 8 degrees C above the standard thermoneutral conditions () tend to decrease locomotor activities, showing lower welfare status. (C) 2019 Published by Elsevier Ltd on behalf of IAgrE.
Fault detection in industrial systems plays a core role in improving their safety, productivity and avoiding expensive maintenance. This paper proposed and verified data-driven anomaly detection schemes based on a non...
详细信息
Fault detection in industrial systems plays a core role in improving their safety, productivity and avoiding expensive maintenance. This paper proposed and verified data-driven anomaly detection schemes based on a nonlinear latent variable model and statistical monitoring algorithms. Integrating both the suitable characteristics of partial least squares (PLS) and adaptive neural network fuzzy inference systems (ANFIS) procedure, PLS-ANFIS model is employed to allow for flexible modeling of multivariable nonlinear processes. Furthermore, PLS-ANFIS modeling was connected with k-nearest neighbors (kNN)-based datamining schemes and employed for nonlinear process monitoring. Specifically, residuals generated from the PLS-ANFIS model are used as the input to the kNN-based mechanism to uncover anomalies in the data. Moreover, kNN-based exponentially smoothing with parametric and nonparametric thresholds is adopted to better anomaly detection. The effectiveness of the proposed approach is evaluated using real measurements from an actual bubble cap distillation column.
In order to overcome the challenges of inadequate classification accuracy in existing fake cybersecurity threat intelligence mining methods and the lack of high-quality public datasets for training classification mode...
详细信息
In order to overcome the challenges of inadequate classification accuracy in existing fake cybersecurity threat intelligence mining methods and the lack of high-quality public datasets for training classification models, we propose a novel approach that significantly advances the field. We improved the attention mechanism and designed a generative adversarial network based on the improved attention mechanism to generate fake cybersecurity threat intelligence. Additionally, we refine text tokenization techniques and design a detection model to detect fake cybersecurity threats intelligence. Using our STIX-CTIs dataset, our method achieves a remarkable accuracy of 96.1%, outperforming current text classification models. Through the utilization of our generated fake cybersecurity threat intelligence, we successfully mimic data poisoning attacks within open-source communities. When paired with our detection model, this research not only improves detection accuracy but also provides a powerful tool for enhancing the security and integrity of open-source ecosystems.
Sequential pattern mining is an important datamining problem with broad applications. However, it is also a difficult problem since the mining may have to generate or examine a combinatorially explosive number of int...
详细信息
Sequential pattern mining is an important datamining problem with broad applications. However, it is also a difficult problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Most of the previously developed sequential pattern mining methods, such as GSP, explore a candidate generation-and-test approach [1] to reduce the number of candidates to be examined. However, this approach may not be efficient in mining large sequence databases having numerous patterns and/or long patterns. In this paper, we propose a projection-based, sequential pattern-growth approach for efficient mining of sequential patterns. In this approach, a sequence database is recursively projected into a set of smaller projected databases, and sequential patterns are grown in each projected database by exploring only locally frequent fragments. Based on an initial study of the pattern growth-based sequential pattern mining, FreeSpan [8], we propose a more efficient method, called PSP, which offers ordered growth and reduced projected databases. To further improve the performance, a pseudoprojection technique is developed in PrefixSpan. A comprehensive performance study shows that PrefixSpan, in most cases, outperforms the a priori-based algorithm GSP, FreeSpan, and SPADE [29] ( a sequential pattern miningalgorithm that adopts vertical data format), and PrefixSpan integrated with pseudoprojection is the fastest among all the tested algorithms. Furthermore, this mining methodology can be extended to mining sequential patterns with user-specified constraints. The high promise of the pattern-growth approach may lead to its further extension toward efficient mining of other kinds of frequent patterns, such as frequent substructures.
We present a classification algorithm built on our adaptation of the Generalized Lotka-Volterra model, well-known in mathematical ecology. The training algorithm itself consists only of computing several scalars, per ...
详细信息
We present a classification algorithm built on our adaptation of the Generalized Lotka-Volterra model, well-known in mathematical ecology. The training algorithm itself consists only of computing several scalars, per each training vector, using a single global user parameter and then solving a linear system of equations. Construction of the system matrix is driven by our model and based on kernel functions. The model allows an interesting point of view of kernels' role in the inductive learning process. We describe the model through axiomatic postulates. Finally, we present the results of the preliminary validation experiments.
Traditional pattern growth-based approaches for sequential pattern mining derive length-(k + 1) patterns based on the projected databases of length-k patterns recursively. At each level of recursion, they unidirection...
详细信息
Traditional pattern growth-based approaches for sequential pattern mining derive length-(k + 1) patterns based on the projected databases of length-k patterns recursively. At each level of recursion, they unidirectionally grow the length of detected patterns by one along the suffix of detected patterns, which needs k levels of recursion to find a length-k pattern. In this paper, a novel data structure, UpDown Directed Acyclic Graph (UDDAG), is invented for efficient sequential pattern mining. UDDAG allows bidirectional pattern growth along both ends of detected patterns. Thus, a length-k pattern can be detected in left perpendicular log(2)k + 1 right perpendicular levels of recursion at best, which results in fewer levels of recursion and faster pattern growth. When minSup is large such that the average pattern length is close to 1, UDDAG and PrefixSpan have similar performance because the problem degrades into frequent item counting problem. However, UDDAG scales up much better. It often outperforms PrefixSpan by almost one order of magnitude in scalability tests. UDDAG is also considerably faster than Spade and LapinSpam. Except for extreme cases, UDDAG uses comparable memory to that of PrefixSpan and less memory than Spade and LapinSpam. Additionally, the special feature of UDDAG enables its extension toward applications involving searching in large spaces.
Class imbalance occurs in classification problems in which the "normal" cases, or instances, significantly outnumber the "abnormal" instances. Training a standard classifier on imbalanced data lead...
详细信息
Class imbalance occurs in classification problems in which the "normal" cases, or instances, significantly outnumber the "abnormal" instances. Training a standard classifier on imbalanced data leads to predictive biases which cause poor performance on the class(es) with lower prior probabilities. The less frequent classes are often critically important events, such as system failure or the occurrence of a rare disease. As a result, the class imbalance problem has been considered to be of great importance for many years. In this paper, we propose a novel algorithm that utilizes the furthest neighbor of a candidate example to generate new synthetic samples. A key advantage of SOMTEFUNA over existing methods is that it does not have parameters to tune (such as K in SMOTE). Thus, it is significantly easier to utilize in real-world applications. We evaluate the benefit of resampling with SOMTEFUNA against state-of-the-art methods including SMOTE, ADASYN and SWIM using Naive Bayes and Support Vector Machine classifiers. Also, we provide a statistical analysis based on Wilcoxon Signed-rank test to validate the significance of the SMOTEFUNA results. The results indicate that the proposed method is an efficient alternative to the current methods. Specifically, SOMTEFUNA achieves better 5-fold cross validated ROC and precision-recall space performance.
We propose a general mechanism to represent the spatial transactions in a way that allows the use of the existing datamining methods. Our proposal allows the analyst to exploit the layered structure of geographical i...
详细信息
We propose a general mechanism to represent the spatial transactions in a way that allows the use of the existing datamining methods. Our proposal allows the analyst to exploit the layered structure of geographical information systems in order to define the layers of interest and the relevant spatial relations among them. Given a reference object, it is possible to describe its neighborhood by considering the attribute of the object itself and the objects related by the chosen relations. The resulting spatial transactions may be either considered like "traditional" transactions, by considering only the qualitative spatial relations, or their spatial extension can be exploited during the datamining process. We explore both these cases. First we tackle the problem of classifying a spatial dataset, by taking into account the spatial component of the data to compute the statistical measure (i.e., the entropy) necessary to learn the model. Then, we consider the task of extracting spatial association rules, by focusing on the qualitative representation of the spatial relations. The feasibility of the process has been tested by implementing the proposed method on top of a GIS tool and by analyzing real world data.
In this paper the problem of Contiguous Item Sequential Pattern ( CISP) mining is presented as a sequential pattern mining problem under two constraints. First, each element in a sequence consists of only one item. Se...
详细信息
In this paper the problem of Contiguous Item Sequential Pattern ( CISP) mining is presented as a sequential pattern mining problem under two constraints. First, each element in a sequence consists of only one item. Second, items appearing in the sequences that contain a pattern must be adjacent with respect to the underlying order as they appear in the pattern. Even though the problem of CISP mining can be solved by using previous approaches on sequential pattern mining under a general constraint description framework, this may lead to poor performance due to the large searching space. To efficiently solve this problem, a new data structure, UpDown Tree, is proposed for CISP mining. UpDown Tree based approach can greatly improve the efficiency of CISP mining in terms of both time and memory comparing to previous approaches. An extensive experimental study has shown promising results with our approach.
This case report describes the innovative design and build of an algorithm that integrates available data from separate hospital-based informatics systems, which perform different daily functions to augment the contac...
详细信息
This case report describes the innovative design and build of an algorithm that integrates available data from separate hospital-based informatics systems, which perform different daily functions to augment the contact-tracing process of COVID-19 patients by identifying exposed neighboring patients and healthcare workers and assessing their risk. Prior to the establishment of the algorithm, contact-tracing teams comprising 6 members would spend up to 10 hours each to complete contact tracing for 5 new COVID-19 patients. With the augmentation by the algorithm, we observed >= 60% savings in overall man-hours needed for contact tracing when there were 5 or more daily new cases through a time-motion study and Monte Carlo simulation. This improvement to the hospital's contact-tracing process supported more expeditious and comprehensive downstream contact-tracing activities as well as improved manpower utilization in contact tracing.
暂无评论