Association rule mining is one of prominent techniques to discover the relation between data items of a transactional data. The process of mining has been simplified by considering only the frequent itemsets. Pincer s...
详细信息
ISBN:
(纸本)9783319606187;9783319606170
Association rule mining is one of prominent techniques to discover the relation between data items of a transactional data. The process of mining has been simplified by considering only the frequent itemsets. Pincer search is one of the frequent itemset mining method which combines top-down and bottom-up search techniques to get the benefits of both. Top-down approach in Pincer search reduces the number of candidates in pass of iterations and saves a lot of computing resources. In this work, we present a parallel Pincer Search (PPS) which is based on distributed implementation on Spark framework. We have converted the search algorithm according to the Spark framework to make it run in parallel. Spark provides a lot of features for the iterative algorithm such as in-memory execution, efficient data structure, better fault tolerant method, etc. We implemented the PPS on a Spark cluster with multiple datasets and analysed the performance.
This track has made signicant contribution to advance the current state of the art in enterprise distributed application management across datacenters, and clouds public or private. A new computing model proposing ext...
详细信息
ISBN:
(纸本)9781728106762
This track has made signicant contribution to advance the current state of the art in enterprise distributed application management across datacenters, and clouds public or private. A new computing model proposing extensions to the current von Neumann implementation of the Turing machine was contributed by the efforts of several participants in this conference since 2009. Keeping with this tradition, this year a new paper demonstrates theory and practice that pushes the Church-Turing thesis boundaries and demonstrates a novel low-latency and high-performance edge computing solution. In addition, there are nine full papers describing advances in current distributed and cloud computing practices dealing with quality of service, business intelligence, cloud security, Internet of Things, cloud performance and economics.
The aim of the paper is to introduce general techniques in order to optimize the parallel execution time of sorting on a distributed architectures with processors of various speeds. Such an application requires a part...
详细信息
ISBN:
(纸本)3540338098
The aim of the paper is to introduce general techniques in order to optimize the parallel execution time of sorting on a distributed architectures with processors of various speeds. Such an application requires a partitioning step. For uniformly related processors (processors speeds are related by a constant factor), we develop a constant time technique for mastering processor load and execution time in an heterogeneous environment and also a technique to deal with unknown cost functions. For non uniformly related processors, we use a technique based on dynamic programming. Most of the time, the solutions are in O(p) (p is the number of processors), independent of the problem size n. Consequently, there is a small overhead regarding the problem we deal with but it is inherently limited by the knowing of time complexity of the portion of code following the partitioning.
The concept lattice can be changed based on sub-lattice was proposed in this paper, and the new concept lattice was composed of a series of sub-lattice which was isomorphic to the original related sub-lattice, the iso...
详细信息
ISBN:
(数字)9783642254376
ISBN:
(纸本)9783642254369
The concept lattice can be changed based on sub-lattice was proposed in this paper, and the new concept lattice was composed of a series of sub-lattice which was isomorphic to the original related sub-lattice, the isomorphic relation and the method to obtain the new concept lattice from the original concept lattice by the new context and the arrow relation was proposed and proved. At last, we analysis that concept lattice was adapted to be exchange information model because of a series of inner, advanced property consisted with distributed, parallel algorithm.
The main goal of this workshop is to provide a timely forum for the exchange and dissemination of new ideas, techniques and research in the field of the new parallel and distributed computational models. The workshop ...
详细信息
Scientific computing applications with highly demanding data capacity and computation power drive a computing platform migration from shared memory machines to multi-core/multiprocessor computer clusters. However, ove...
详细信息
ISBN:
(纸本)9783540680819
Scientific computing applications with highly demanding data capacity and computation power drive a computing platform migration from shared memory machines to multi-core/multiprocessor computer clusters. However, overheads in coordinating operations across computing nodes could counteract the benefit of having extra machines. Furthermore, the hidden dependency in applications slows down the simulation over non-shared memory machines. This paper proposed a framework to utilize multi-core/multiprocessor clusters for distributed simulation. Among several coordination schemes, decentralized control approach has demonstrated its effectiveness in reducing the communication overheads. A speculative execution strategy is applied to exploit parallelism thoroughly and overcome strong data dependency. Performance analysis and experiments are provided to demonstrate the performance gains.
Fuzzy Integral is compared with other two methods which are hot in studying of classifiers ' fusion. The standard model of Fuzzy Integral and its general solution are introduced. Then, the state of the art and the...
详细信息
R-tree is a very popular dynamic access structure cable of storing multidimensional and spatial data. Considering it's merit of the efficient global balance and dynamic reorganization. We try to use R-tree to decl...
详细信息
ISBN:
(纸本)0818678763
R-tree is a very popular dynamic access structure cable of storing multidimensional and spatial data. Considering it's merit of the efficient global balance and dynamic reorganization. We try to use R-tree to decluster the multiattribute data in database system or file system. As Many previous multiattribute declustering mechanisms do not take into account the properties of the Cluster of Workstations (COW), we present the Global parallel R-Tree(GPR-Tree) under the architecture of COW. Firstly we inspect the issues in efficiency of R-tree and it's variants, we try to enhance the R-Tree efficiency by using heuristics information in the reconstruction of R-Tree during the node splitting and the treatment of the orphan entries of the underfilled node. Then we parallelize the improved R-Tree among the components in the system. The basic thought is to alleviate the bottleneck effect of the I/O subsystem, making use of the high speed network communication and the memory. The GPR-Tree is shared among the processing units (PU) of the system. We use a mixed LRU algorithm to schedule pages in memory to maintain the nodes visited frequently in memory. A write-update-like protocol is used to keep the coherency among multiple copies maintained in the system. This mechanism will be proved efficient to improve the salability and performance of the system.
Nowadays, when the data size grows exponentially, it becomes more and more difficult to extract useful information in reasonable time. One very important technique to exploit data is clustering and many algorithms hav...
详细信息
ISBN:
(纸本)9783030018214
Nowadays, when the data size grows exponentially, it becomes more and more difficult to extract useful information in reasonable time. One very important technique to exploit data is clustering and many algorithms have been proposed like k-means and its variations (k-medians, kernel k-means etc.), DBSCAN, OPTICS and others. The time complexity of all these methods is prohibitive (NP hard) in order to make decisions on time and the solution is either new faster algorithms to be invented, or increase the performance of the old well tested ones. distributed, parallel, and multi-core GPU computing or even combination of these platforms consist a very promising method to speed up clustering techniques. In this paper, parallel versions of the above mentioned algorithms were used and implemented in order to increase their performance and consequently, their perspectives in several fields like industry, political/social sciences, telecommunications businesses, and intrusion detection in big networks. The parallel versions of clustering techniques are presented here and two different cases of their applications on different fields are illustrated. The results obtained are very promising concerning their quality and performance and therefore, the perspective of using clustering techniques in industry and sciences is increased.
distributed storage systems store data on the "unreliable" network peers that can leave the system at any moment and their network bandwidth is limited. In this case, the only way to assure reliability of th...
详细信息
ISBN:
(纸本)9783642283079;9783642283086
distributed storage systems store data on the "unreliable" network peers that can leave the system at any moment and their network bandwidth is limited. In this case, the only way to assure reliability of the data is to add redundancy using either replication or erasure codes. As a generalization of replication, erasure codes require less storage space with the same reliability as replication. Recently, a near-optimal erasure code named Hierarchical Codes, has been proposed that can significantly reduce the repair traffic by reducing the number of nodes participating in repair, which is referred to as repair degree d. To overcome the complexity of reintegration and efficiently control the reliability of Hierarchical Codes, we refine two concepts called location and relocation, then we propose an integrated maintenance scheme, which allow us to tune the code construction.
暂无评论