Summary form of only given: Apache Hadoop has become the platform of choice for developing large-scale data-intensive applications. In this tutorial, we will discuss design philosophy of Hadoop, describe how to design...
详细信息
Summary form of only given: Apache Hadoop has become the platform of choice for developing large-scale data-intensive applications. In this tutorial, we will discuss design philosophy of Hadoop, describe how to design and develop Hadoop applications and higher-level application frameworks to crunch several terabytes of data, using anywhere from four to 4,000 computers. We will discuss solutions to common problems encountered in maximizing Hadoop application performance. We will also describe several frameworks and utilities developed using Hadoop that increase programmer-productivity and application-performance.
Efforts to support high performance computing (HPC) applications' requirements in the context of cloud computing have motivated us to design HPC Shelf, a cloud computing services platform to build and deploy large...
详细信息
Efforts to support high performance computing (HPC) applications' requirements in the context of cloud computing have motivated us to design HPC Shelf, a cloud computing services platform to build and deploy large-scale parallel computing systems. We introduce Alite, the contextual contract system of HPC Shelf, to select component implementations according to requirements of the host application, target parallel computing platform characteristics (e.g., clusters and MPPs), quality of service (QoS) properties, and cost restrictions. It is evaluated through a small-scale case study employing two complementary component-based frameworks. The first one aims to represent components that implement linear algebra computations based on the BLAS interface. In turn, the second one aims to represent parallel computing platforms on the IaaS cloud offered by Amazon EC2 Service.
Satellite remote sensing radar technologies provide powerful tools for geohazard monitoring and risk management at synoptic scale. In particular, advanced Multi-Temporal SAR Interferometric algorithms are capable to d...
详细信息
ISBN:
(纸本)9781479979301
Satellite remote sensing radar technologies provide powerful tools for geohazard monitoring and risk management at synoptic scale. In particular, advanced Multi-Temporal SAR Interferometric algorithms are capable to detect ground deformations and structural instabilities with millimetric precision, but impose strong requirements in terms of hardware re-sources. Recent advances in GPU computing and programming hold promise for time efficient implementation of imaging algorithms, thus enhancing the development of advanced Emergency Management Services based on Earth Observation technologies. In this study, a preliminary assessment of the potentials of GPU processing is carried out, by comparing CPU (single- and multi-thread) and GPU implementations of InSAR time-consuming algorithm kernels. In particular, it is focused on the fine coregistration of SAR interferometric pairs, a crucial step in the interferogram generation process. Experimental results are discussed.
Heterogeneous parallel systems are becoming mainstream computing platforms nowadays. One of the main challenges the development community is currently facing is how to fully exploit the available computational power w...
详细信息
ISBN:
(纸本)9781479905874
Heterogeneous parallel systems are becoming mainstream computing platforms nowadays. One of the main challenges the development community is currently facing is how to fully exploit the available computational power when porting existing programs or developing new ones with available techniques. In this direction, several design space exploration methods have been presented and extensively adopted. However, defining the feasible design space of a dynamic dataflow program still remains an open issue. This paper proposes a novel methodology for defining such a space through a serial execution. Homotopy theoretic methods are used to demonstrate how the design space of a program can be reconstructed from its serial execution trajectory. Moreover, the concept of dependencies graph of a dataflow program defined in the literature is extended with the definition of two new kinds of dependencies - the Guard Enable and Disable - and the 3-tuple notion needed to represent them.
With the shrinking of transistors continuing to follow Moore's Law and the non-scalability of conventional out-of-order processors, multi-core systems are becoming the design choice for industry. Performance extra...
详细信息
With the shrinking of transistors continuing to follow Moore's Law and the non-scalability of conventional out-of-order processors, multi-core systems are becoming the design choice for industry. Performance extraction is thus largely alleviated from the hardware and placed on the pro-gr ammer/compiler camp, who now have to expose Thread Level parallelism (TLP) to the underlying system in the form of explicitly parallel applications. Unfortunately, parallel programming is hard and error-prone. The programmer has to parallelize the work, perform the data placement, and deal with thread synchronization. Systems that support speculative multithreaded execution like Thread Level Speculation (TLS), offer an interesting alternative since they relieve the programmer from the burden of parallelizing applications and correctly synchronizing them. Since systems that support speculative multithreading usually treat all threads equally, they are energy-inefficient. This inefficiency stems from the fact that speculation occasionally fails and, thus, power is spent on threads that will have to be discarded. In this paper we propose a power allocation scheme for TLS systems, based on Dynamic Voltage and Frequency Scaling (DVFS), that tries to remedy this inefficiency. More specifically, we propose a profitability-based power allocation scheme, where we ¿steal¿ power fro m non-profitable threads and use it to speed up more useful ones. We evaluate our techniques for a state-of-the-art TLS system and show that, with minimal hardware support, they lead to improvements in ED of up to 39.6% with an average of 21.2%, for a subset of the SPEC 2000 Integer benchmark suite.
Parsec is a parallel programming environment whose goal is to simplify the development of multicomputer programs without, as is often the case, sacrificing performance. We have reconciled these objectives by "com...
详细信息
Parsec is a parallel programming environment whose goal is to simplify the development of multicomputer programs without, as is often the case, sacrificing performance. We have reconciled these objectives by "compiling" the structure of parallel applications into information to configure each of a small set of communication primitives on a context-sensitive basis. In this paper, we show how Parsec can be used to implement a high-performance processor farm and compare Parsec and hand-optimized implementations to demonstrate that Parsec can achieve a similar level of performance. Extensive static analysis and optimization is necessary to achieve these results. We discuss both the tools which perform these tasks as well as the user interface that provides the necessary declarative structural information. Using the processor farm, we show how Parsec simplifies the task of specifying the structure of a parallel application and improves the result by supporting abstraction, reuse and scalability.< >
In this paper, we propose a new family of interconnection networks, called cyclic networks (CNs), in which an intercluster connection is defined on a set of nodes whose addresses are cyclic shifts of one another. The ...
详细信息
In this paper, we propose a new family of interconnection networks, called cyclic networks (CNs), in which an intercluster connection is defined on a set of nodes whose addresses are cyclic shifts of one another. The node degrees of basic CNs are independent of system size, but can vary from a small constant (e.g., 3) to as large as required, thus providing flexibility and effective tradeoff between cost and performance. The diameters of suitably constructed CNs can be asymptotically optimal within their lower bounds, given the degrees. We show that packet routing and ascend/descend algorithms can be performed in /spl Theta/(log/sub d/ N) communication steps on some CNs with N nodes of degree /spl Theta/(d). Moreover CNs can also efficiently emulate homogeneous product networks (e.g., hypercubes and high dimensional meshes). As a consequence, we obtain a variety of efficient algorithms on such networks, thus proving the versatility of CNs.
A study is reported whose aim was to produce a system to facilitate offline programming of robots and to provide a testbed for alternative algorithms for the services provided. The system was specified using the forma...
详细信息
A study is reported whose aim was to produce a system to facilitate offline programming of robots and to provide a testbed for alternative algorithms for the services provided. The system was specified using the formal description technique LOTOS (language of temporal ordering specification). LOTOS is best known for its use in the description of OSI protocols and is supported by an ISO standard. LOTOS consists of a process algebra for specifying the structure of the system and the interactions between components of the system, and an algebraic data typing mechanism for specifying the operations the system carries out. The description of the system was heavily influenced by techniques used in the design of operating systems. Concurrency was introduced at the initial design stage, there was an explicit separation of concerns and the specification was structured hierarchically, with actions at one level appearing atomic to the next higher level. Each level in the hierarchy provides an increasingly abstract view of the robot. The resulting description was executed, or animated, using the SEDOS tool, to help determine that the correct behaviour had been encapsulated by the description. The specification was then implemented on a network of transputers, using 3L parallel Pascal.< >
The mpC (message-passing C) language was developed to write efficient and portable programs for wide range of distributed memory machines. It supports both task and data parallelism, allows both static and dynamic pro...
详细信息
The mpC (message-passing C) language was developed to write efficient and portable programs for wide range of distributed memory machines. It supports both task and data parallelism, allows both static and dynamic process and communication structures, enables optimizations aimed at both communication and computation, and supports modular parallel programming and the development of a library of parallel programs. The language is an ANSI C superset based on the notion of a network comprising processor nodes of different types and performances, connected with links of different bandwidths. The user can describe a network topology, create and discard networks, and distribute data and computations over networks. The mpC programming environment uses the topological information at run-time to ensure the efficient execution of the application. This paper describes the implementation of network management in the mpC programming environment.
暂无评论