Exhaustive methods of pattern extraction in a database face real obstacles to speed and output control of patterns: a large number of patterns are extracted, many of which are redundant. Pattern extraction methods thr...
详细信息
Exhaustive methods of pattern extraction in a database face real obstacles to speed and output control of patterns: a large number of patterns are extracted, many of which are redundant. Pattern extraction methods through sampling, which allow for controlling the size of the outputs while ensuring fast response times, provide a solution to these two problems. However, these methods do not provide high-quality patterns: they return patterns that are very infrequent in the database. Furthermore, they do not scale. To ensure more frequent and diversified patterns in the output, we propose integrating compression methods into sampling to select the most representative patterns from the sampled transactions. We demonstrate that our approach improves the state of the art in terms of diversity of produced patterns.
Activation of the autonomic nervous system is a primary characteristic of human hedonic responses to sensory stimuli. For smells, general tendencies of physiological reactions have been described using classical stati...
详细信息
Activation of the autonomic nervous system is a primary characteristic of human hedonic responses to sensory stimuli. For smells, general tendencies of physiological reactions have been described using classical statistics. However, these physiological variations are generally not quantified precisely;each psychophysiological parameter has very often been studied separately and individual variability was not systematically considered. The current study presents an innovative approach based on data mining, whose goal is to extract knowledge from a dataset. This approach uses a subgroup discovery algorithm which allows extraction of rules that apply to as many olfactory stimuli and individuals as possible. These rules are described by intervals on a set of physiological attributes. Results allowed both quantifying how each physiological parameter relates to odor pleasantness and perceived intensity but also describing the participation of each individual to these rules. This approach can be applied to other fields of affective sciences characterized by complex and heterogeneous datasets.
Process mining utilises process execution data to discover and analyse business processes. Event logs represent process executions, providing information about the activities executed. In addition to generic event att...
详细信息
Process mining utilises process execution data to discover and analyse business processes. Event logs represent process executions, providing information about the activities executed. In addition to generic event attributes like activity name and timestamp, events might contain domain-specific attributes, such as a blood sugar measurement in a healthcare environment. Many of these values change during a typical process quite frequently. We refer to those as dynamic event attributes. Change patterns can be derived from dynamic event attributes, describing if the attribute values change from one activity to another. So far, change patterns can only be identified in an isolated manner, neglecting the chance of finding co-occuring change patterns. This paper provides an approach to identifying relationships between change patterns by utilising correlation methods from statistics. We applied the proposed technique on two event logs derived from the MIMIC-IV real-world dataset on hospitalisations in the US and evaluated the results with a medical expert. It turns out that relationships between change patterns can be detected within the same directly or eventually follows relation and even beyond that. Further, we identify unexpected relationships that are occurring only at certain parts of the process. Thus, the process perspective reveals novel insights on how dynamic event attributes change together during process execution. The approach is implemented in Python using the PM4Py framework.
Process mining (PM) aims to construct, from event logs, process maps that can help discover, automate, improve, and monitor organizational processes. Robotic process automation (RPA) uses software robots to perform so...
详细信息
Process mining (PM) aims to construct, from event logs, process maps that can help discover, automate, improve, and monitor organizational processes. Robotic process automation (RPA) uses software robots to perform some tasks usually executed by humans. It is usually difficult to determine what processes and steps to automate, especially with RPA. PM is seen as one way to address such difficulty. This paper aims to assess the applicability of process mining in accelerating and improving the implementation of RPA, along with the challenges encountered throughout project lifecycle. A systematic literature review was conducted to examine the approaches where PM techniques were used to understand the as-is processes that can be automated with software robots. Seven databases were used to identify papers on this topic. A total of 32 papers, all published since 2018, were selected from 605 unique candidate papers and then analyzed. There is a steady increase in the number of publications in this domain, especially during the year 2022, which suggests a raising interest in the combined use of PM with RPA. The literature mainly focuses on the methods to record the events that occur at the level of user interactions with the application, and on the preprocessing methods that are needed to discover routines with the steps that can be automated. Important challenges are faced with preprocessing such event logs, and many lifecycle steps of automation projects are weakly supported by existing approaches suggesting corresponding research areas in need of further attention.
As the development of the Internet and IT technology, short-text based communication is so popular compared with voice based one. Chat-based communication enables rapid, short and massive exchange of message with many...
详细信息
As the development of the Internet and IT technology, short-text based communication is so popular compared with voice based one. Chat-based communication enables rapid, short and massive exchange of message with many people, creates new social problems. 'Cross-texting' is one of them. It refers to accidentally sending a text to an unintended person during the concurrent conversations with separated multiple people. Cross-texting would be a serious problem in languages where respectful expressions are required. As text-based communication is getting popular, it is a crucial work to prevent cross-texting by detecting it in advance in languages with honorifics expression such as Korean. In this paper, we proposed two methods detecting a cross-text using a deep learning model. The first model is the formal feature vector, which models dialog by explicitly defining the politeness and completeness features. The second one is the grpah2vec based ChatGram-net model, which models the dialog based on the syllable occurrence relationship. To evaluate the detection performance, we suggest a generating method for cross-text datasets from a actual messenger corpus. In experiment we show that both proposed models detected cross-text effectively, and exceeded the performance of the baseline models.
Among the most popular collaborative filtering algorithms are methods based on the K nearest neighbors (KNN). In their basic operation, KNN methods consider a fixed number of neighbors to make recommendations. However...
详细信息
Among the most popular collaborative filtering algorithms are methods based on the K nearest neighbors (KNN). In their basic operation, KNN methods consider a fixed number of neighbors to make recommendations. However, it is not easy to choose an appropriate number of neighbors. Thus, it is generally fixed by calibration to avoid inappropriate values which would negatively affect the accuracy of the recommendations. In the literature, some authors have addressed the problem of dynamically finding an appropriate number of neighbors. But they use additional parameters which limit their proposals because these parameters also require calibration. In this paper, we propose a parameter-free KNN method for rating prediction. It is able to dynamically select an appropriate number of neighbors to use. The experiments that we did on four publicly available datasets demonstrate the efficiency of our proposal. It rivals those of the state of the art in their best configurations.
Personalised analytics is a powerful technology that can be used to improve the career, lifestyle, and health of individuals by providing them with an in-depth analysis of their characteristics as compared to other pe...
详细信息
Personalised analytics is a powerful technology that can be used to improve the career, lifestyle, and health of individuals by providing them with an in-depth analysis of their characteristics as compared to other people. Existing research has often focused on mining general patterns or clusters, but without the facility for customisation to an individual's needs. It is challenging to adapt such approaches to the personalised case, due to the high computational overhead they require for discovering patterns that are good across an entire dataset, rather than with respect to an individual. In this paper, we tackle the challenge of personalised pattern mining and propose a query-driven approach to mine objects with subspace similarity. Given a query object in a categorical dataset, our proposed algorithm, PRESS (Personalised Subspace Similarity), determines the top-k groups of objects, where each group has high similarity to the query for some particular subspace. We evaluate the efficiency and effectiveness of our approach on both synthetic and real datasets.
In the research of P2P lending data, the study of borrower characteristics is of great value for the establishment of target customers and risk management. Because of high dimensionality, mixed attributes, different i...
详细信息
In the research of P2P lending data, the study of borrower characteristics is of great value for the establishment of target customers and risk management. Because of high dimensionality, mixed attributes, different importance, and different generation time of information, P2P lending data often leads to the mining results unable to reflect the important borrower characteristics that affect the approval results and the approval loan amount. In this article, we are the first to propose the attributes partition of lending data considering the business process to classify variables into different types. Furthermore, we propose a multiangle data mining method for lending data by attributes partition considering the business process to discover the characteristics of P2P borrowers from multiple perspectives. Experimental results on the real dataset demonstrate that the method depicts the important characteristics of borrowers that affect the approval results and the loan amount, makes the research on P2P borrower characteristics more comprehensive and specific, and provides new ideas for the research on high-dimensional lending data.
The research of traffic revitalization index can provide support for the formulation and adjustment of policies related to urban management, epidemic prevention and resumption of work and production. This paper propos...
详细信息
The research of traffic revitalization index can provide support for the formulation and adjustment of policies related to urban management, epidemic prevention and resumption of work and production. This paper proposes a deep model for the prediction of urban Traffic Revitalization Index (DeepTRI). The DeepTRI builds model for the data of COVID-19 epidemic and traffic revitalization index for major cities in China. The location information of 29 cities forms the topological structure of graph. The Spatial Convolution Layer proposed in this paper captures the spatial correlation features of the graph structure. The special Graph Data Fusion module distributes and fuses the two kinds of data according to different proportions to increase the trend of spatial correlation of the data. In order to reduce the complexity of the computational process, the Temporal Convolution Layer replaces the gated recursive mechanism of the traditional recurrent neural network with a multi-level residual structure. It uses the dilated convolution whose dilation factor changes according to convex function to control the dynamic change of the receptive field and uses causal convolution to fully mine the historical information of the data to optimize the ability of long-term prediction. The comparative experiments among DeepTRI and three baselines (traditional recurrent neural network, ordinary spatial-temporal model and graph spatial-temporal model) show the advantages of DeepTRI in the evaluation index and resolving two under-fitting problems (under-fitting of edge values and under-fitting of local peaks).
The actionable behavioral rules suggest specific actions that may influence certain behavior in the stakeholders' best interest. In mining such rules, it was assumed previously that all attributes are categorical ...
详细信息
The actionable behavioral rules suggest specific actions that may influence certain behavior in the stakeholders' best interest. In mining such rules, it was assumed previously that all attributes are categorical while the numerical attributes have been discretized in advance. However, this assumption significantly reduces the solution space, and thus hinders the potential of miningalgorithms, especially when the numerical attributes are prevalent. As the numerical data are ubiquitous in business applications, there is a crucial need for new mining methodologies that can better leverage such data. To meet this need, in this paper, we define a new data mining problem, named behavior action mining, as a problem of continuous variable optimization of expected utility for action. We then develop three approaches to solving this new problem, which uses regression as a technical basis. The experimental results based on a marketing dataset demonstrate the validity and superiority of our proposed approaches.
暂无评论