Graph processing has become an integral part of big data analytics. With the ever increasing size of the graphs, one needs to partition them into smaller clusters, which can be managed and processed more easily on mul...
详细信息
ISBN:
(纸本)9783662433522;9783662433515
Graph processing has become an integral part of big data analytics. With the ever increasing size of the graphs, one needs to partition them into smaller clusters, which can be managed and processed more easily on multiple machines in a distributed fashion. While there exist numerous solutions for edge-cut partitioning of graphs, very little effort has been made for vertex-cut partitioning. This is in spite of the fact that vertex-cuts are proved significantly more effective than edge-cuts for processing most real world graphs. In this paper we present JA-BEJA- VC, a parallel and distributed algorithm for vertex-cut partitioning of large graphs. In a nutshell, JA-BE-JA-VC is a local search algorithm that iteratively improves upon an initial random assignment of edges to partitions. We propose several heuristics for this optimization and study their impact on the final partitioning. Moreover, we employ simulated annealing technique to escape local optima. We evaluate our solution on various graphs and with variety of settings, and compare it against two state-of-the-art solutions. We show that JA-BE-JA-VC outperforms the existing solutions in that it not only creates partitions of any requested size, but also requires a vertex-cut that is better than its counterparts and more than 70% better than random partitioning.
The cloud-native database is one of the hottest topics in database research. Storage-compute separation architecture with cloud characteristics is designed to process complex and changeable workloads to reduce resourc...
详细信息
As big data, medical digitalization, and wearable devices continue to evolve, these technologies are driving the advancement of clinical medicine, genomics, and wearable health while also posing a risk of privacy brea...
详细信息
Constraint propagation algorithms present inherent parallelism. Each constraint behaves as a concurrent process triggered by changes in the store of variables, updating the store in its turn. There is an inherent sequ...
详细信息
ISBN:
(纸本)3540652248
Constraint propagation algorithms present inherent parallelism. Each constraint behaves as a concurrent process triggered by changes in the store of variables, updating the store in its turn. There is an inherent sequentiality, as well, since a constraint must be executed only as the consequence of a previous execution of another constraint. We have developed different parallel execution models of constraint propagation for MIMD distributed memory machines. We have adopted the indexical scheme, an adequate approach to achieve consistency for n-ary constraints. The proposed models arise from two techniques, dynamic and static, for scheduling constraint executions (assignment of constraint executions to processing elements). In the static scheduling models the constraint graph is divided into N partitions, which are executed in parallel on N processors. We have investigated an important issue affecting performance, the criterion to establish the graph partition in order to balance the run-time workload. In the dynamic scheduling models, any processor can execute any constraint, improving the workload balance. However, a coordination mechanism is required to ensure a sound order in the execution of constraints. We have designed coordination mechanisms for both centralised and distributed control schemes. Several parallelprocessing methods for solving Constraint Satisfaction Problems have been proposed. [1] and [3] must be remarked in relation with our work.
The rise of the cloud and distributed data-intensive (" Big Data") applications puts pressure on data center networks due to the movement of massive volumes of data. This paper proposes CodHoop a system empl...
详细信息
ISBN:
(纸本)9781479959273
The rise of the cloud and distributed data-intensive (" Big Data") applications puts pressure on data center networks due to the movement of massive volumes of data. This paper proposes CodHoop a system employing network coding techniques, specifically index coding, as a means of dynamically-controlled reduction in volume of communication. Using Hadoop as a representative of this class of applications, a motivating use-case is presented. The proof-of-concept implementation results exhibit an average advantage of 31% compared to vanilla Hadoop implementation which depending on use-case translates to 31% less energy utilization of the equipment, 31% more jobs that run simultaneously, or to a 31% decrease in job completion time.
The recent development on semiconductor process and design technologies enables multi-core processors to become a dominant market trend in desk-top PCs as well as high end mobile devices. At the same time, the increas...
详细信息
The Single Source Shortest Path (SSSP) problem consists in finding the shortest paths from a vertex (the source vertex) to all other vertices in a graph. SSSP has numerous applications. For some algorithms and applica...
详细信息
The proceedings contain 16 papers. The topics discussed include: a risk-based model for service level agreement differentiation in cloud market providers;adaptive and scalable high availability for infrastructure clou...
ISBN:
(纸本)9783662433515
The proceedings contain 16 papers. The topics discussed include: a risk-based model for service level agreement differentiation in cloud market providers;adaptive and scalable high availability for infrastructure clouds;trust-aware operation of providers in cloud markets;scaling HDFS with a strongly consistent relational model for metadata;distributed exact deduplication for primary storage infrastructures;scalable and accurate causality tracking for eventually consistent stores;cooperation across multiple healthcare clinics on the cloud;behave: behavioral cache for web content;implementing the WebSocket protocol based on formal modeling and automated code generation;autonomous multi-dimensional slicing for large-scale distributed systems;and bandwidth-minimized distribution of measurements in global sensor networks.
In the last years not only a growth of data-intensive storage has been observed, but also compute-intensive workloads need a high computing power and high parallelism with good performance and great scalability. Many ...
详细信息
ISBN:
(纸本)9783319495835;9783319495828
In the last years not only a growth of data-intensive storage has been observed, but also compute-intensive workloads need a high computing power and high parallelism with good performance and great scalability. Many distributed filesystem have focused in how to distribute data across multiple processing nodes, but one of the main problem to solve is the management of the ever-greater number of metadata requests. In fact, some studies have identified that an optimized metadata management is a key factor to achieve good performance. applications in high performance computing usually require filesystems able to provide a huge amount of operations per second to achieve the required level of performance. Although the metadata storage is smaller than data storage, metadata operations consume large CPU cycles, so a single metadata server cannot be longer sufficient. In this paper we define a completely distributed method that provides efficient metadata management and seamlessly adapts to general purpose and scientific computing filesystem workloads. The throughput performance is measured by a metadata benchmark and compared with several distributed filesystems. The results show great scalability in creating operations on a single directory accessed by multiple clients.
Moving loads such as cars and trains are very useful sources of seismic waves, which can be analyzed to retrieve information on the seismic velocity of subsurface materials using the techniques of ambient noise seismo...
详细信息
ISBN:
(纸本)9781728162515
Moving loads such as cars and trains are very useful sources of seismic waves, which can be analyzed to retrieve information on the seismic velocity of subsurface materials using the techniques of ambient noise seismology. This information is valuable for a variety of applications such as geotechnical characterization of the near-surface, seismic hazard evaluation, and groundwater monitoring. However, for such processes to converge quickly, data segments with appropriate noise energy should be selected. distributed Acoustic Sensing (DAS) is a novel sensing technique that enables acquisition of these data at very high spatial and temporal resolution for tens of kilometers. One major challenge when utilizing the DAS technology is the large volume of data that is produced, thereby presenting a significant Big Data challenge to find regions of useful energy. In this work, we present a highly scalable and efficient approach to process real, complex DAS data by integrating physics knowledge acquired during a data exploration phase followed by deep supervised learning to identify "useful" coherent surface waves generated by anthropogenic activity, a class of seismic waves that is abundant on these recordings and is useful for geophysical imaging. Data exploration and training were done on 130 Gigabytes (GB) of DAS measurements. Using parallel computing, we were able to do inference on an additional 170 GB of data (or the equivalent of 10 days' worth of recordings) in less than 30 minutes. Our method provides interpretable patterns describing the interaction of ground-based human activities with the buried sensors.
暂无评论