Summary form of only given: Apache Hadoop has become the platform of choice for developing large-scale data-intensive applications. In this tutorial, we will discuss design philosophy of Hadoop, describe how to design...
详细信息
Summary form of only given: Apache Hadoop has become the platform of choice for developing large-scale data-intensive applications. In this tutorial, we will discuss design philosophy of Hadoop, describe how to design and develop Hadoop applications and higher-level application frameworks to crunch several terabytes of data, using anywhere from four to 4,000 computers. We will discuss solutions to common problems encountered in maximizing Hadoop application performance. We will also describe several frameworks and utilities developed using Hadoop that increase programmer-productivity and application-performance.
The recent popularity of the Java programming language has brought automatic dynamic memory management (a.k.a., the garbage collection) into the mainstream. Traditional garbage collectors suffer from long garbage coll...
详细信息
ISBN:
(纸本)0780373715
The recent popularity of the Java programming language has brought automatic dynamic memory management (a.k.a., the garbage collection) into the mainstream. Traditional garbage collectors suffer from long garbage collection pauses (stop-the-world mark-sweep algorithm) or inability of collecting cyclic garbage (reference counting approach). Generational garbage collection, however, is based only on the weak generational hypothesis that most objects die young. In this paper, the performance evaluation of a new multithreaded concurrent generational garbage collector (MCGC) based on mark-sweep with the assistance of reference counting is reported. The MCGC can take advantage of multiple CPUs in an SMP system and the merits of lightweight processes. Furthermore, the long garbage collection pause can be reduced and the garbage collection efficiency can be enhanced. Measurement results indicate that the MCGC improves the garbage collection pause time up to 96.75% over the traditional stop-the-world mark-sweep garbage collector. Moreover, the MCGC receives minimal time and space penalties as shown in the report of the total execution time, the memory footprint and the sticky reference count rate.
This work is devoted to the problem of detecting and processing faults of computing nodes during execution of parallel programs on distributed computing systems. The fault tolerance tools of PBS/TORQUE are considered....
详细信息
ISBN:
(纸本)9781728129877
This work is devoted to the problem of detecting and processing faults of computing nodes during execution of parallel programs on distributed computing systems. The fault tolerance tools of PBS/TORQUE are considered. The functional model for faults handling optimization are proposed.
In this paper we analyze the teaching and learning of parallel processing through performance analysis using a software tool called Prober. This tool is a functional and performance analyzer of parallel programs that ...
详细信息
In this paper we analyze the teaching and learning of parallel processing through performance analysis using a software tool called Prober. This tool is a functional and performance analyzer of parallel programs that we proposed and developed during an undergraduate research project. Our teaching and learning approach consists of a practical class where students receive explanations about some concepts of parallel processing and the use of the tool. They do some oriented and simple performance tests on parallel programs and analyze their results using Prober as a single aid tool. Finally, students answer a self-assessment questionnaire about their formation, their knowledge of parallel processing concepts and also about the usability of Prober. Our main goal is to show that students can learn concepts of parallel processing in a clearer, faster and more efficient way using our approach.
parallelism is a suitable approach for speeding up the massive computations of applications, but parallel programming is difficult yet. Algorithmic skeleton is a parallel programming model that provides a high level o...
详细信息
parallelism is a suitable approach for speeding up the massive computations of applications, but parallel programming is difficult yet. Algorithmic skeleton is a parallel programming model that provides a high level of abstraction for programmers. This approach uses the pre-defined components to facilitate easier parallel programming. Divide and conquer (DC) is an appropriate parallel pattern for implementation as a skeleton. The solution of the original problem is obtained by dividing it into smaller sub-problems and solving them in parallel. Today, graphics processor unit (GPU) is an attractive computational processor for doing tasks in parallel, because it has a large number of process units. In this paper, divide and conquer skeleton on GPU has been proposed and named OC_***_GPU is a divide and conquer skeleton that is implemented on GPU that using a consistent programming interface in C++ for easier parallel programming. Performance of this skeleton has been evaluated by mergesort and sobeledge detection. The results show that obtained speedup at this skeleton is more than 2 on GPU.
A data-graph computation — popularized by such programming systems as Galois, Pregel, GraphLab, PowerGraph, and GraphChi — is an algorithm that performs local updates on the vertices of a graph. During each round of...
详细信息
ISBN:
(纸本)9781450328210
A data-graph computation — popularized by such programming systems as Galois, Pregel, GraphLab, PowerGraph, and GraphChi — is an algorithm that performs local updates on the vertices of a graph. During each round of a data-graph computation, an update function atomically modifies the data associated with a vertex as a function of the vertex's prior data and that of adjacent vertices. A dynamic data-graph computation updates only an active subset of the vertices during a round, and those updates determine the set of active vertices for the next *** paper introduces PRISM, a chromatic-scheduling algorithm for executing dynamic data-graph computations. PRISM uses a vertex-coloring of the graph to coordinate updates performed in a round, precluding the need for mutual-exclusion locks or other nondeterministic data synchronization. A multibag data structure is used by PRISM to maintain a dynamic set of active vertices as an unordered set partitioned by color. We analyze PRISM using work-span analysis. Let G=(V,E) be a degree-Δ graph colored with Χ colors, and suppose that Q⊆V is the set of active vertices in a round. Define size(Q)=[Q] + Σv∈Qdeg(v), which is proportional to the space required to store the vertices of Q using a sparse-graph layout. We show that a P-processor execution of PRISM performs updates in Q using O(Χ(lg (Q/Χ)+lgΔ)+ lgP) span and Θ(size(Q)+Χ+P) work. These theoretical guarantees are matched by good empirical performance. We modified GraphLab to incorporate PRISM and studied seven application benchmarks on a 12-core multicore machine. PRISM executes the benchmarks 1.2–2.1 times faster than GraphLab's nondeterministic lock-based scheduler while providing deterministic *** paper also presents PRISM-R, a variation of PRISM that executes dynamic data-graph computations deterministically even when updates modify global variables with associative operations. PRISM-R satisfies the same theoretical bounds as PRISM, but its implementation is
The computation of geodesic distances is an important research topic in Geometry Processing and 3D Shape Analysis as it is a basic component of many methods used in these areas. In this work, we present a minimalistic...
详细信息
A study is reported whose aim was to produce a system to facilitate offline programming of robots and to provide a testbed for alternative algorithms for the services provided. The system was specified using the forma...
详细信息
A study is reported whose aim was to produce a system to facilitate offline programming of robots and to provide a testbed for alternative algorithms for the services provided. The system was specified using the formal description technique LOTOS (language of temporal ordering specification). LOTOS is best known for its use in the description of OSI protocols and is supported by an ISO standard. LOTOS consists of a process algebra for specifying the structure of the system and the interactions between components of the system, and an algebraic data typing mechanism for specifying the operations the system carries out. The description of the system was heavily influenced by techniques used in the design of operating systems. Concurrency was introduced at the initial design stage, there was an explicit separation of concerns and the specification was structured hierarchically, with actions at one level appearing atomic to the next higher level. Each level in the hierarchy provides an increasingly abstract view of the robot. The resulting description was executed, or animated, using the SEDOS tool, to help determine that the correct behaviour had been encapsulated by the description. The specification was then implemented on a network of transputers, using 3L parallel Pascal.< >
The mpC (message-passing C) language was developed to write efficient and portable programs for wide range of distributed memory machines. It supports both task and data parallelism, allows both static and dynamic pro...
详细信息
The mpC (message-passing C) language was developed to write efficient and portable programs for wide range of distributed memory machines. It supports both task and data parallelism, allows both static and dynamic process and communication structures, enables optimizations aimed at both communication and computation, and supports modular parallel programming and the development of a library of parallel programs. The language is an ANSI C superset based on the notion of a network comprising processor nodes of different types and performances, connected with links of different bandwidths. The user can describe a network topology, create and discard networks, and distribute data and computations over networks. The mpC programming environment uses the topological information at run-time to ensure the efficient execution of the application. This paper describes the implementation of network management in the mpC programming environment.
Heterogeneous parallel systems are becoming mainstream computing platforms nowadays. One of the main challenges the development community is currently facing is how to fully exploit the available computational power w...
详细信息
ISBN:
(纸本)9781479905874
Heterogeneous parallel systems are becoming mainstream computing platforms nowadays. One of the main challenges the development community is currently facing is how to fully exploit the available computational power when porting existing programs or developing new ones with available techniques. In this direction, several design space exploration methods have been presented and extensively adopted. However, defining the feasible design space of a dynamic dataflow program still remains an open issue. This paper proposes a novel methodology for defining such a space through a serial execution. Homotopy theoretic methods are used to demonstrate how the design space of a program can be reconstructed from its serial execution trajectory. Moreover, the concept of dependencies graph of a dataflow program defined in the literature is extended with the definition of two new kinds of dependencies - the Guard Enable and Disable - and the 3-tuple notion needed to represent them.
暂无评论