The proposed work aims to solve data sparsity problem in the recommendation system. It handles two-level pre-processing techniques to reduce the data size at the item level. Additional resources like items genre, tag,...
详细信息
The proposed work aims to solve data sparsity problem in the recommendation system. It handles two-level pre-processing techniques to reduce the data size at the item level. Additional resources like items genre, tag, and time are added to learn and analyse the behaviour of the user preferences in-depth. The advantage of the proposed method is to recommend the item, based on user interest pattern and avoid recommending the outdated items. User information are grouped based on similar item genre and tag feature. This effectively handle overlapping conditions that exist on item's genre, as it has more than one genre at initial level. Further, based on time, it analyses the user non-static interest. Overall it reduces the dimensions which is an initial way to prepare data, to analyse hidden pattern. To enhance the perfor-mance, the proposed method utilized Apache's spark Mllib FP-Growth and association rule mining approach in a distributed environment. To reduce the computation cost of constructing tree in FP-Growth, the candidate data set is stored in matrix form. The experiments were conducted using MovieLens data set. The observed results shows that the proposed method achieves 4% increase in accu-racy when compared to earlier methods.(c) 2020 The Authors. Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://***/licenses/by-nc-nd/4.0/).
Message format extraction, the process of revealing the message syntax without access to the protocol specification, is important for a variety of applications such as service virtualization and network security. In t...
详细信息
Message format extraction, the process of revealing the message syntax without access to the protocol specification, is important for a variety of applications such as service virtualization and network security. In this paper, we propose P-token, which mines fine-grained message formats from network traces. The novelty of our approach is twofold: a 'positional keyword' identification technique and a two-level hierarchical clustering strategy. Positional keywords are based on the insight that keywords or reserved words usually occur at relatively fixed positions in the messages. By associating positions as meta-information with keywords, we can more accurately distinguish keywords from message payload data. After identification, the positional keywords are used as features to cluster the messages using density peaks clustering. We then perform another level of clustering to refine the clusters with low homogeneity. Finally, the message format of each cluster is extracted based on the observed ordering of keywords. P-token improves on the current state-of-the-art techniques by successfully addressing two challenges that commonly afflict existing keyword based format extraction methods: message keyword mis-identification and message format over-generalization. We have conducted experiments on services and applications using various protocols, including SOAP, LDAP, IMS and a RESTful service. Our experimental results show that P-token outperforms existing methods in extracting message formats. (C) 2019 Elsevier B.V. All rights reserved.
Driving behavior is how drivers respond to actual driving environments and a major factor for road traffic safety. Recent advances in in-vehicle sensors facilitate continuous monitoring of driving behaviors;large-scal...
详细信息
Driving behavior is how drivers respond to actual driving environments and a major factor for road traffic safety. Recent advances in in-vehicle sensors facilitate continuous monitoring of driving behaviors;large-scale driving data have been accumulated. This study develops a framework to evaluate large-scale driving records and to establish clusters that can be used to identify potentially aggressive driving behaviors. The framework employs three steps of data analytic methods: abrupt change detection to extract meaningful driving events from raw data, feature extraction using an auto-encoder, and two-level clustering. This framework is applied to real driving data that were obtained from 43 taxis in Korean metropolitan cities. The application shows that the framework can characterize driving patterns from large-scale driving records and identify clusters with high potential for aggressive driving. The findings imply that the outcome clusters represent the norm of driving behavior and thus can be used as a reference in diagnosing other drivers' behavior. (C) 2017 Elsevier Ltd. All rights reserved.
The self-organizing map (SOM) methodology does vector quantization and clustering on the dataset, and then projects the obtained clusters to a lower dimensional space, such as a 2D map, by positioning similar clusters...
详细信息
The self-organizing map (SOM) methodology does vector quantization and clustering on the dataset, and then projects the obtained clusters to a lower dimensional space, such as a 2D map, by positioning similar clusters in locations that are spatially closer in the lower dimension space. This makes the SOM methodology an effective tool for data visualization. However, in a world where mined information from big data have to be available immediately, SOM becomes an unattractive tool because of its time complexity. In this paper, we propose an alternative visualization methodology for large datasets that emulates SOM methodology without the speed constraints inherent to SOM. To demonstrate the efficiency and the potential of the proposed scheme as a fast visualization tool, the methodology is used to cluster and project the 3,823 image samples of handwritten digits of the Optical Recognition of Handwritten Digits dataset. Although the dataset is not, by any means large, it is sufficient to demonstrate the speed-up that can be achieved by using this proposed SOM emulation procedure. (C) 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://***/licenses/by-nc-nd/4.0/).
The self-organizing map (SOM) methodology does vector quantization and clustering on the dataset, and then projects the obtained clusters to a lower dimensional space, such as a 2D map, by positioning similar clusters...
详细信息
The self-organizing map (SOM) methodology does vector quantization and clustering on the dataset, and then projects the obtained clusters to a lower dimensional space, such as a 2D map, by positioning similar clusters in locations that are spatially closer in the lower dimension space. This makes the SOM methodology an effective tool for data visualization. However, in a world where mined information from big data have to be available immediately, SOM becomes an unattractive tool because of its time complexity. In this paper, we propose an alternative visualization methodology for large datasets that emulates SOM methodology without the speed constraints inherent to SOM. To demonstrate the efficiency and the potential of the proposed scheme as a fast visualization tool, the methodology is used to cluster and project the 3,823 image samples of handwritten digits of the Optical Recognition of Handwritten Digits dataset. Although the dataset is not, by any means large, it is sufficient to demonstrate the speed-up that can be achieved by using this proposed SOM emulation procedure.
The exponential growth of data generates terabytes of very large databases. The growing number of data dimensions and data objects presents tremendous challenges for effective data analysis and data exploration method...
详细信息
The exponential growth of data generates terabytes of very large databases. The growing number of data dimensions and data objects presents tremendous challenges for effective data analysis and data exploration methods and tools. Thus, it becomes crucial to have methods able to construct a condensed description of the properties and structure of data, as well as visualization tools capable of representing the data structure from these condensed descriptions. The purpose of our work described in this paper is to develop a method of describing data from enriched and segmented prototypes using a topological clustering algorithm. We then introduce a visualization tool that can enhance the structure within and between groups in data. We show, using some artificial and real databases, the relevance of the proposed approach. (C) 2012 Elsevier Ltd. All rights reserved.
暂无评论