high-utility sequential pattern mining (HUSPM) is a hot research topic in recent decades since it combines both sequential and utility properties to reveal more information and knowledge rather than the traditional fr...
详细信息
high-utility sequential pattern mining (HUSPM) is a hot research topic in recent decades since it combines both sequential and utility properties to reveal more information and knowledge rather than the traditional frequent itemset mining or sequentialpatternmining. Several works of HUSPM have been presented but most of them are based on main memory to speed up mining performance. However, this assumption is not realistic and not suitable in large-scale environments since in real industry, the size of the collected data is very huge and it is impossible to fit the data into the main memory of a single machine. In this article, we first develop a parallel and distributed three-stage MapReduce model for mininghigh-utilitysequentialpatterns based on large-scale databases. Two properties are then developed to hold the correctness and completeness of the discovered patterns in the developed framework. In addition, two data structures called sidset and utilitylinked list are utilized in the developed framework to accelerate the computation for mining the required patterns. From the results, we can observe that the designed model has good performance in large-scale datasets in terms of runtime, memory, efficiency of the number of distributed nodes, and scalability compared to the serial HUSP-Span approach.
high-utility sequential pattern mining (HUSPM) can be applied in many applications such as retail, market basket analysis, click-stream analysis, healthcare data analysis, and bioinformatics. HUSPM algorithms discover...
详细信息
high-utility sequential pattern mining (HUSPM) can be applied in many applications such as retail, market basket analysis, click-stream analysis, healthcare data analysis, and bioinformatics. HUSPM algorithms discover useful information from data. However, looking at the dark side, the sensitive patterns can also be disclosed by the competitors, who use a HUSPM algorithm on the leaked data. Therefore, high-utilitysequentialpattern hiding (HUSPH) is used to protect the privacy information from HUSPM algorithms. This paper proposes three algorithms named highutilitysequentialpattern Hiding Using Pure Array Structure (USHPA), highutilitysequentialpattern Hiding Using Parallel Strategy (USHP), and highutilitysequentialpattern Hiding Using Random Distribution Strategy (USHR) for hiding high-utilitysequentialpatterns on quantitative sequence datasets. These algorithms use a proposed data structure named patternutility Set for Hiding (PUSH) to speed up the hiding process. We also introduce a metric called Privacy Factor to evaluate the quality of hiding results. The comparative experiments were conducted on real datasets to evaluate the performance of the proposed algorithms in terms of runtime, memory consumption, scalability, missing cost, and privacy factor. Results show that the proposed algorithms can efficiently sanitize the input datasets, and they outperform the compared algorithms for all metrics. (C)& nbsp;2021 Elsevier B.V. All rights reserved.
Frequent closed high-utility (FCHU) sequences are preferable to frequent closed sequences. Not only because of their utility-based nature that considerately contributes to taking decisive business actions, FCHU sequen...
详细信息
ISBN:
(纸本)9789811982330;9789811982347
Frequent closed high-utility (FCHU) sequences are preferable to frequent closed sequences. Not only because of their utility-based nature that considerately contributes to taking decisive business actions, FCHU sequences also preserve necessary information for re-constructing frequent high-utility sequences. Despite of their vital role, mining FCHU sequences is a time consuming task when facing with large-scale datasets, or especially when the input thresholds are relatively small. To contend with these difficulties, this paper proposes a parallel algorithm named P-FCloHUS for fast mining FCHU sequences by making good use of multi-core processors. By relying on a novel Single scan synchronization strategy that is facilitated by an efficiently Partitioned result space structure, P-FCloHUS successfully alleviates the communication cost between mining tasks and hence speeds up the parallel mining process. Experiments on both dense and sparse datasets show that P-FCloHUS outperforms the state-of-the-art FMaxCloHUSM in terms of runtime performance.
highutilitysequentialpatterns (HUSP) are a type of patterns that can be found in data collected in many domains such as business, marketing and retail. Two critical topics related to HUSP are: HUSP mining (HUSPM) a...
详细信息
highutilitysequentialpatterns (HUSP) are a type of patterns that can be found in data collected in many domains such as business, marketing and retail. Two critical topics related to HUSP are: HUSP mining (HUSPM) and HUSP Hiding (HUSPH). HUSPM algorithms are designed to discover all sequentialpatterns that have a utility greater than or equal to a minimum utility threshold in a sequence database. HUSPH algorithms, by contrast, conceal all HUSP so that competitors cannot find them in shared databases. This paper focuses on HUSPH. It proposes an algorithm named HUS-Hiding to efficiently hide all HUSP. An extensive experimental evaluation is conducted on six real-life datasets to evaluate the performance of the proposed algorithm. According to the experimental results, the designed algorithm is more effective than three state-of-the-art algorithms in terms of runtime, memory usage and hiding accuracy. (C) 2018 Elsevier Inc. All rights reserved.
暂无评论