High-utility sequential pattern mining (HUSPM) is a hot research topic in recent decades since it combines both sequential and utility properties to reveal more information and knowledge rather than the traditional fr...
详细信息
High-utility sequential pattern mining (HUSPM) is a hot research topic in recent decades since it combines both sequential and utility properties to reveal more information and knowledge rather than the traditional frequent itemset mining or sequential pattern mining. Several works of HUSPM have been presented but most of them are based on main memory to speed up mining performance. However, this assumption is not realistic and not suitable in large-scale environments since in real industry, the size of the collected data is very huge and it is impossible to fit the data into the main memory of a single machine. In this article, we first develop a parallel and distributed three-stage MapReduce model for mining high-utility sequential patterns based on large-scale databases. Two properties are then developed to hold the correctness and completeness of the discovered patterns in the developed framework. In addition, two data structures called sidset and utilitylinked list are utilized in the developed framework to accelerate the computation for mining the required patterns. From the results, we can observe that the designed model has good performance in large-scale datasets in terms of runtime, memory, efficiency of the number of distributed nodes, and scalability compared to the serial HUSP-Span approach.
The mission of subspace clustering is to find hidden clusters exist in different subspaces within a dataset. In recent years, with the exponential growth of data size and data dimensions, traditional subspace clusteri...
详细信息
ISBN:
(纸本)9781538636497
The mission of subspace clustering is to find hidden clusters exist in different subspaces within a dataset. In recent years, with the exponential growth of data size and data dimensions, traditional subspace clustering algorithms become inefficient as well as ineffective while extracting knowledge in the big data environment, resulting in an emergent need to design efficient paralleldistributed subspace clustering algorithms to handle large multi-dimensional data with an acceptable computational cost. In this paper, we introduce MR-Mafia: a parallel mafia subspace clustering algorithm based on MapReduce. The algorithm takes advantage of MapReduce's data partitioning and task parallelism and achieves a good tradeoff between the cost for disk accesses and communication cost. The experimental results show near linear speedups and demonstrate the high scalability and great application prospects of the proposed algorithm.
In the last few years, geometric semantic genetic programming has incrementedits popularity, obtaining interesting results on several real life applications. Nevertheless,the large size of the solutions generated by g...
详细信息
In the last few years, geometric semantic genetic programming has incrementedits popularity, obtaining interesting results on several real life applications. Nevertheless,the large size of the solutions generated by geometric semantic geneticprogramming is still an issue, in particular for those applications in which readingand interpreting the final solution is desirable. In this thesis, a new paralleland distributed genetic programming system is introduced with the objective ofmitigating this drawback. The proposed system (called MPHGP, which stands forMulti-Population Hybrid Genetic Programming) is composed by two types of subpopulations,one of which runs geometric semantic genetic programming, whilethe other runs a standard multi-objective genetic programming algorithm that optimizes,at the same time, fitness and size of solutions. The two subpopulationsevolve independently and in parallel, exchanging individuals at prefixed synchronizationinstants. The presented experimental results, obtained on five real-lifesymbolic regression applications, suggest that MPHGP is able to find solutionsthat are comparable, or even better, than the ones found by geometric semanticgenetic programming, both on training and on unseen testing data. At the sametime, MPHGP is also able to find solutions that are significantly smaller than theones found by geometric semantic genetic programming.
An introduction is given to the topics of the parallel and distributed simulation and of the modeling of telecommunications systems. Our practical modeling concept for simulation in heterogeneous execution environment...
详细信息
ISBN:
(纸本)9781479904020;9781479904037
An introduction is given to the topics of the parallel and distributed simulation and of the modeling of telecommunications systems. Our practical modeling concept for simulation in heterogeneous execution environment is presented. Its logical topology is a star shaped network of homogeneous clusters. The load balancing and the coupling factor criteria are set up for building models of telecommunications systems so that the simulation may produce good speed-up in a heterogeneous distributed execution environment. A case study is given with the open source OMNeT++ discrete- event simulation system and its parallel CQN (closed queueing network) sample model executed by 64 CPU cores of four different types. Our criteria are heavily supported by the results of our experiments.
This paper proposes a simulation framework suitable for holonic manufacturing systems, or HMS, based on the concept of distributed self-simulation. HMS is a distributed system that comprises autonomous and cooperative...
详细信息
This paper proposes a simulation framework suitable for holonic manufacturing systems, or HMS, based on the concept of distributed self-simulation. HMS is a distributed system that comprises autonomous and cooperative elements called holons, for the flexible and agile manufacturing. The simulation framework proposed here capitalizes on this distributed nature. where each holon functions similar to an independent simulator with self-simulation capabilities to maintain its own clock, handle events, and detect inter-holon state inconsistencies and perform rollback actions. This paper discusses the detailed architecture and design issues of such a simulator and reports on the results of a prototype.
The reconfigurable mesh consists of an array of processors interconnected by a reconfigurable bus system. The bus system can be used to dynamically obtain various interconnection patterns among the processors. Recentl...
详细信息
The reconfigurable mesh consists of an array of processors interconnected by a reconfigurable bus system. The bus system can be used to dynamically obtain various interconnection patterns among the processors. Recently, this model has attracted a lot of attention. In this paper, we show O(1) time solutions to the following computational geometry problems on the reconfigurable mesh: all-pairs nearest neighbors, convex hull, triangulation, two-dimensional maxima, two-set dominance counting, and smallest enclosing box. All these solutions accept N planar points as input and employ an Nx N reconfigurable mesh. The basic scheme employed in our implementations is to recursively find an O(1) time solution. The number of recursion levels and the size of the subproblems at each level of recursion are optimized such that the problem decomposition and the solution to the problem can be obtained in constant time. As a result, we have developed some efficient merge techniques to combine the solutions for subproblems on the reconfigurable mesh. These techniques exploit reconfigurability in nontrivial ways leading to constant time solutions using optimal size of the mesh.
暂无评论