Mining association rules from large databases is one of the most important tasks from data mining. Nowadays, the majority of companies produce a significant amount of data stored in distributeddatabases. In this case...
详细信息
ISBN:
(纸本)9781467399104
Mining association rules from large databases is one of the most important tasks from data mining. Nowadays, the majority of companies produce a significant amount of data stored in distributeddatabases. In this case, most of the traditional algorithms for mining association rules become ineffective because they require a lot of resources to extract the frequent patterns. The cloud computing technologies provide us the infrastructure for handling such massive datasets. In this paper, we propose an extension of the Count Distribution algorithm for mining fuzzy association rules from a distributed database. The algorithm uses the MapReduce programming model, which aims to distribute the mining process over many cluster nodes. Distributing the mining process allows handling very large databases and significantly improves the execution time.
In this article, we report about platform and architecture that real-time analysis of big data are possible, and structured IT infrastructure that they are optimally combined. We developed a distributed architecture w...
详细信息
ISBN:
(纸本)9781509045112
In this article, we report about platform and architecture that real-time analysis of big data are possible, and structured IT infrastructure that they are optimally combined. We developed a distributed architecture which the data conversion and the abnormality determination are multi-blocked. Furthermore, by selecting a distributed storage DB, we succeeded in constructing IT infrastructure capable of high-speed processing at a large number of manufacturing sites. In the new IT infrastructure, we achieved resource leveling of the application server and improvement of data processing time. It is expected that data file stagnation and DB registration delay can be resolved.
The recent deployment of novel network concepts, such as M2M communication or IoT, has undoubtedly stimulated the placement of a new set of services, leveraging both centralized resources in Cloud Data Centers and dis...
详细信息
ISBN:
(纸本)9781538627051
The recent deployment of novel network concepts, such as M2M communication or IoT, has undoubtedly stimulated the placement of a new set of services, leveraging both centralized resources in Cloud Data Centers and distributed resources shared by devices at the edge of the network. Moreover, Fog Computing has been recently proposed having as one of its main assets the reduction of service response time, further enabling the deployment of real-time services. Albeit QoS-aware network researches have been originally focused on data plane issues, the successful deployment of real-time services, demanding very low delay on the allocation of distributed resources, depends on the assessment of the impact of controlling decisions on QoS. Recently, Fog-to-Cloud (F2C) computing has been proposed as a hierarchical layered-architecture relying on a coordinated and distributed management of both Fog and Cloud resources, enabling the distributed and parallel allocation of resources at distinct layers, thus suitably mapping services demands into resources availability. In this paper, we assess the layered management architecture in F2C systems, taking into account its distributed nature. Preliminary results show the tradeoff observed regarding controllers capacity, number of controllers, and number of controller layers in the F2C architecture.
The size and complexity of supercomputing clusters are rapidly increasing to cater to the needs of complex scientific applications. At the same time, the feature size and operating voltage level of the internal compon...
详细信息
ISBN:
(纸本)9781509035137
The size and complexity of supercomputing clusters are rapidly increasing to cater to the needs of complex scientific applications. At the same time, the feature size and operating voltage level of the internal components are decreasing. This dual trend makes these machines extremely vulnerable to soft errors or random bit flips. For complex parallel applications, these soft errors can lead to silent data corruption which could lead to large inaccuracies in the final computational results. Hence, it is important to determine the presence and severity of such errors early on, so that proper counter measures can be taken. In this paper, we introduce a tool called Sirius, which can accurately identify silent data corruptions based on the simple insight that there exist spatial and temporal locality within most variables in such programs. Spatial locality means that values of the variable at nodes that are close by in a network sense, are also close numerically. Similarly, temporal locality means that the values change slowly and in a continuous manner with time. Sirius uses neural networks to learn such locality patterns, separately for each critical variable, and produces probabilistic assertions which can be embedded in the code of the parallel program to detect silent data corruptions. We have implemented this technique on parallel benchmark programs - LULESH and CoMD. Our evaluations show that Sirius can detect silent errors in the code with much higher accuracy compared to previously proposed methods. Sirius detected 98% of the silent data corruptions with a false positive rate of less than 0.02 as compared to the false positive rate 0.06 incurred by the state of the art acceleration based prediction (ABP) based technique.
The proceedings contain 210 papers. The topics discussed include: towards a green, QoS-enabled heterogeneous cloud infrastructure;predicting job completion time in heterogeneous MapReduce environments;minimizing renta...
ISBN:
(纸本)9781509021406
The proceedings contain 210 papers. The topics discussed include: towards a green, QoS-enabled heterogeneous cloud infrastructure;predicting job completion time in heterogeneous MapReduce environments;minimizing rental cost for multiple recipe applications in the cloud;providing fairness in heterogeneous multicores with a predictive, adaptive scheduler;dynamic resource management for parallel tasks in an oversubscribed energy-constrained heterogeneous environment;evaluation of emerging energy-efficient heterogeneous computing platforms for biomolecular and cellular simulation workloads;latency, power, and security optimization in distributed reconfigurable embedded systems;and a reconfigurable fixed-point architecture for adaptive beamforming.
Tensor completion is a powerful tool used to estimate or recover missing values in multi-way data. It has seen great success in domains such as product recommendation and healthcare. Tensor completion is most often ac...
详细信息
ISBN:
(纸本)9781467388153
Tensor completion is a powerful tool used to estimate or recover missing values in multi-way data. It has seen great success in domains such as product recommendation and healthcare. Tensor completion is most often accomplished via low-rank sparse tensor factorization, a computationally expensive non-convex optimization problem which has only recently been studied in the context of parallel computing. In this work, we study three optimization algorithms that have been successfully applied to tensor completion: alternating least squares (ALS), stochastic gradient descent (SGD), and coordinate descent (CCD++). We explore opportunities for parallelism on shared-and distributed-memory systems and address challenges such as memory-and operation-efficiency, load balance, cache locality, and communication. Among our advancements are an SGD algorithm which combines stratification with asynchronous communication, an ALS algorithm rich in level-3 BLAS routines, and a communication-efficient CCD++ algorithm. We evaluate our optimizations on a variety of real datasets using a modern supercomputer and demonstrate speedups through 1024 cores. These improvements effectively reduce time-to-solution from hours to seconds on real-world datasets. We show that after our optimizations, ALS is advantageous on parallelsystems of small-to-moderate scale, while both ALS and CCD++ will provide the lowest time-to-solution on large-scale distributedsystems.
Data access of a massive collection of geographic spatial data is one of the serious bottlenecks in large-scale datacentric applications in the big data era such as data assimilation and urban data analytic systems. I...
详细信息
ISBN:
(纸本)9781509051465
Data access of a massive collection of geographic spatial data is one of the serious bottlenecks in large-scale datacentric applications in the big data era such as data assimilation and urban data analytic systems. In this paper, we consider the issue of implementation of distributed spatial indices, specifically quad trees, on a distributed computing system in the shared-nothing memory approach. We discuss static and dynamic partitioning and allocation strategies for data and queries across distributed nodes. Using scale-down parallel data load and search experiments with a small distributed processor system as proof-of-concept, we show that the proposed approach with a collection of small indices of distributed shared-nothing memory is more efficient than the conventional approach with a single processor with a large external index. We also observed that the proposed tree-based partitioning and assignment strategy using sampling reduces query time than other conventional partitioning strategies used in databases. We also discuss how to allocate a collection of small tree indices among distributed processors. These results suggest that the use of parallelized access to databases with spatial indexing functions can enhance the throughput of large-scale data-centric applications.
The proceedings contain 28 papers. The topics discussed include: routing on the dependency graph: a new approach to deadlock-free high-performance routing;towards practical algorithm based fault tolerance in dense lin...
ISBN:
(纸本)9781450343145
The proceedings contain 28 papers. The topics discussed include: routing on the dependency graph: a new approach to deadlock-free high-performance routing;towards practical algorithm based fault tolerance in dense linear algebra;new-sum: a novel online ABFT scheme for general iterative methods;SDS-Sort: scalable dynamic skew-aware parallel sorting;SWAT: a programmable, in-memory, distributed, high-performance computing platform;consecutive job submission behavior at Mira supercomputer;with extreme scale computing the rules have changed;NVL-C: static analysis techniques for efficient, correct programming of non-volatile main memory systems;algorithm-directed data placement in explicitly managed non-volatile memory;self-configuring software-defined overlay bypass for seamless inter- and intra-cloud virtual networking;Wiera: towards flexible multi-tiered geo-distributed cloud storage instances;IMPACC: a tightly integrated MPI+OpenACC framework exploiting shared memory parallelism;faster and cheaper: parallelizing large-scale matrix factorization on GPUs;implications of heterogeneous memories in next generation server systems;DD-Graph: a highly cost-effective distributed disk-based graph-processing framework;evaluation of pattern matching workloads in graph analysis systems;and BAShuffler: maximizing network bandwidth utilization in the shuffle of YARN.
With the advent of clustered systems, more and more parallel computing is required. However a lot of programming skills is needed to write a parallel codes, especially when you want to benefit from the various paralle...
详细信息
The proceedings contain 25 papers. The topics discussed include: elasticity based scheduling heuristic algorithm for cloud environments;benchmark generation and simulation at extreme scale;FlipSphere: a software-based...
ISBN:
(纸本)9781509035045
The proceedings contain 25 papers. The topics discussed include: elasticity based scheduling heuristic algorithm for cloud environments;benchmark generation and simulation at extreme scale;FlipSphere: a software-based DRAM error detection and correction library for HPC;investigating a science gateway for an agent-based simulation application using REPAST;fault-tolerant adaptive parallel and distributed simulation;Agents+Control: a methodology for CPSs;a lock-free O(1) event pool and its application to share-everything PDES platforms;enhanced null message algorithm for hybrid parallel simulation systems with large disparity in time step;link partitioning in parallel simulation of scale-free networks;combining interest management and dead reckoning: a hybrid approach for efficient data distribution in multiplayer online games;promoting a-priori interoperability of HLA-based simulations in the space domain: the SISO space reference FOM initiative;realisation of navigation concepts for the multi-agent flood algorithm for search & rescue scenarios using RFID tags;RA2: predicting simulation execution time for cloud-based design space explorations;real-time scheduling of reconfigurable distributed embedded systems with energy harvesting prediction;a low-cost IoT application for the urban traffic of vehicles, based on wireless sensors using GSM technology;and distributed/parallel genetic algorithm for road traffic network division using a hybrid island model/step parallelization approach.
暂无评论