In this paper, we study parallel data access on distributed file systems, e.g, the Hadoop file system. Our experiments show that parallel data read requests are often served data remotely and in an imbalanced fashion....
详细信息
ISBN:
(纸本)9781479986484
In this paper, we study parallel data access on distributed file systems, e.g, the Hadoop file system. Our experiments show that parallel data read requests are often served data remotely and in an imbalanced fashion. This results in a serious disk access and data transfer contention on certain cluster/storage nodes. We conduct a complete analysis on how remote and imbalanced read patterns occur and how they are affected by the size of the cluster. We then propose a novel method to Optimize parallel Data Access on distributed File systems referred to as Opass. The goal of Opass is to reduce remote parallel data accesses and achieve a higher balance of data read requests between cluster nodes. To achieve this goal, we represent the data read requests that are issued by parallel applications to cluster nodes as a graph data structure where edges weights encode the demands of data locality and load capacity. Then we propose new matching-based algorithms to match processes to data based on the configurations of the graph data structure so as to compute the maximum degree of data locality and balanced access. Our proposed method can benefit parallel data-intensive analysis with various parallel data access strategies. Experiments are conducted on PRObEs Marmot 128-node cluster testbed and the results from both benchmark and well-known parallel applications show the performance benefits and scalability of Opass.
The initialization of distributed heterogeneous simulation systems presents challenges regarding the parallelization of object construction and setup. This paper presents a method for parallel initialization of distri...
详细信息
ISBN:
(纸本)9781479961436
The initialization of distributed heterogeneous simulation systems presents challenges regarding the parallelization of object construction and setup. This paper presents a method for parallel initialization of distributed simulation systems that consists of a two phases setup. Object instantiation and setup are split in Config and Post Bind phases to permit fast creation times allowing distribution of initialization tasks among different nodes and removing the ordering requirement between the initialization of interdependent objects. A framework of references is presented to facilitate the use of remote objects in a MPI environment using proxies to access local and remote variables, served by a reference name server built into the simulation engine.
Algorithms for processing distributed queries require a priori estimates of the size of intermediate relations. Most such algorithms take a “static” approach in which the algorithm is completely determined before pr...
详细信息
ISBN:
(纸本)0818620528
Algorithms for processing distributed queries require a priori estimates of the size of intermediate relations. Most such algorithms take a “static” approach in which the algorithm is completely determined before processing begins. If size estimates are found to be inaccurate at some intermediate stage, there is no opportunity to re-schedule, and the result may be far from optimal. Adaptive query execution may be used to alleviate the problem. Care is necessary, though, to ensure that the delay associated with re-scheduling does not exceed the time saved through the use of a more efficient strategy. This paper presents a low overhead delay method to decide when to correct a strategy. Sampling is used to estimate the size of relations, and alternative heuristic strategies prepared in a background mode are used to decide when to correct. Correction is made only if lower overall delay is achieved, including correction time. Evaluation using a model of a distributed data base indicates that the heuristic strategies are near optimal. Moreover, it also suggests that it is usually correct to abort creation of an intermediate relation which is much larger than predicted.
In some scenarios involving on-line transaction processing within a distributed database, it is desirable to synchronize transactions in a manner that guarantees conflict equivalence with a serial schedule ordered by ...
详细信息
ISBN:
(纸本)076950728X
In some scenarios involving on-line transaction processing within a distributed database, it is desirable to synchronize transactions in a manner that guarantees conflict equivalence with a serial schedule ordered by original transaction start times while providing each transaction with an anomaly serializable isolation. Few theoretical concurrency control algorithms guarantee such a conflict equivalence, and we are unaware of any protocol that accomplishes this while supporting real-world issues such as out-of-order transaction messages, out-of-order operation executions, and out-of-order transaction committals without the burden of explicit readset and writeset declarations We describe an algorithm that provides this guarantee mid supports these issues while requiring only table-level writeset declarations.
Advances in communication for parallel programming have yielded one-sided messaging systems. The MPI bindings for Ruby have been augmented to include the remote memory access functions of MPI-2.
ISBN:
(纸本)0780321754
Advances in communication for parallel programming have yielded one-sided messaging systems. The MPI bindings for Ruby have been augmented to include the remote memory access functions of MPI-2.
This paper describes a novel approach to parallel simulation of complex multi-agent systems which is based on actors and the Java middleware Terracotta. The approach aims to an exploitation of the computing power of m...
详细信息
ISBN:
(纸本)9780769542515
This paper describes a novel approach to parallel simulation of complex multi-agent systems which is based on actors and the Java middleware Terracotta. The approach aims to an exploitation of the computing power of modern multi-core machines. Terracotta was chosen because it transparently allows to cluster the JVM. The paper discusses design and implementation aspects of the approach, and demonstrates the achievable execution performance through the parallel simulation of a scalable multi-agent system based on the predator/prey model.
As processors and systems on chip in the embedded world increasingly become multicore, parallel programming remains a difficult, time-consuming and complicated task. End users who are not parallel programming experts ...
详细信息
ISBN:
(纸本)9781479942930
As processors and systems on chip in the embedded world increasingly become multicore, parallel programming remains a difficult, time-consuming and complicated task. End users who are not parallel programming experts have a need to exploit such processors and architectures, using high level programming languages, like Scilab or MATLAB. The ALMA toolset solves this problem: it takes Scilab code as input and produces parallel code for embedded multiprocessor systems on chip, using platform quasi-agnostic optimizations. The platform information is provided by an architecture description language designed for the purpose of a flexible system description as well as simulation. A hierarchical system description in combination with a parameterizable simulation environment allows fine-grained trade-offs between simulation performance and simulation accuracy.
We introduce the all-software, standard C++-based Aurora distributed shared data system. As with related systems, it provides a shared data abstraction on distributed memory hardware. An innovation in Aurora is the us...
详细信息
ISBN:
(纸本)0818677937
We introduce the all-software, standard C++-based Aurora distributed shared data system. As with related systems, it provides a shared data abstraction on distributed memory hardware. An innovation in Aurora is the use of scoped behaviour for per-context data sharing optimizations (i.e., portion of source code, such as a loop or phase). With scoped behaviour a new language scope (e.g., nested braces) can be used to optimize the data sharing behaviour of the selected source code. Different scopes and different shared data can be optimized in different ways. Thus, scoped behaviour provides a novel level of flexibility to incrementally tune the parallel performance of an application.
The amount of data generated by social media, social networks and distributed platforms such as blockchain, have reached quite high levels. Various data analysis methods could be applied this big data. One of these me...
详细信息
ISBN:
(纸本)9781728138015
The amount of data generated by social media, social networks and distributed platforms such as blockchain, have reached quite high levels. Various data analysis methods could be applied this big data. One of these methods is to classify geo-tagged social network data in order to report geographical area associated with the data. We propose an efficient parallel classification approach and implement a classifier tool which is capable of processing huge amount of data. To test our approach, we collect Twitter data over five densest areas of Turkey. There are important factors affecting the classification performance such as the spatial indexing and the parallelization strategies. Hierarchical Triangular Mesh (HTM) and R-Tree spatial indexes are used for indexing regions. For parallel processing data streams classifier tool is implemented based on Apache Spark and Kafka platforms in order to obtain high scalability. To show effectiveness of our method, we perform tests on Amazon Web Services (AWS) Cloud environment and compare our method against a method which implements HTM on a Microsoft SQL Server. Results show that 1.6 - 4.5 fold speed-up is obtained and Twitter data that is collected over a month can be processed effectively in three hours.
Dedicated Cluster parallel Computers (DCPCs) are emerging as low-cost high performance environments for many important applications in science and engineering. A significant class of applications that perform well on ...
详细信息
ISBN:
(纸本)0818675829
Dedicated Cluster parallel Computers (DCPCs) are emerging as low-cost high performance environments for many important applications in science and engineering. A significant class of applications that perform well on a DCPC are coarse-grain applications that involve large amounts of file I/O. Current research in parallel file systems for distributedsystems is providing a mechanism for adapting these applications to the DCPC environment. We present the parallel Virtual File System (PVFS), a system that provides disk striping across multiple nodes in a distributedparallel computer and file partitioning among tasks in a parallel program. PVFS is unique among similar systems in that it uses a streams-based approach that represents each file access with a single set of request parameters and decouples the number of network messages from details of the files striping and partitioning. PVFS also provides support for efficient collective file accesses and allows overlapping file partitions. We present results of early performance experiments that show PVFS achieves excellent speedups in accessing moderately sized file segments.
暂无评论