This article introduces the application of streaming media technology in *** concerns the basic communication protocol,the characteristics of applied software,distributed systems,parallel programming,the brief of intr...
详细信息
This article introduces the application of streaming media technology in *** concerns the basic communication protocol,the characteristics of applied software,distributed systems,parallel programming,the brief of introduction for main algorithm.
MPI derived datatypes are a powerful method to define arbitrary collections of non-contiguous data in memory and to enable non-contiguous data communication in a single MPI function call. In this paper, we employ MPI ...
详细信息
MPI derived datatypes are a powerful method to define arbitrary collections of non-contiguous data in memory and to enable non-contiguous data communication in a single MPI function call. In this paper, we employ MPI datatypes in four NAS benchmarks (MG, LU, BT, and SP) to transfer non-contiguous data. Comprehensive performance evaluation was carried out on two clusters: an Itanium-2 Myrinet cluster and a Xeon InfiniBand cluster. Performance results show that using datatypes can achieve performance comparable to manual packing/unpacking in the original benchmarks, though the MPI implementations that were studied also perform internal packing and unpacking on noncontiguous datatype communication. In some cases, better performance can be achieved because of the reduced costs to transfer non-contiguous data. This is because some optimizations in the MPI packing/unpacking implementations can be easily overlooked in manual packing and unpacking by users. Our case study demonstrates that MPI datatypes simplify the implementation of non-contiguous communication and lead to application code with portable performance. We expect that with further improvement of datatype processing and datatype communication such as [10, 24], datatypes can outperform the conventional methods of noncontiguous data communication. Our modified NAS benchmarks can be used to evaluate datatype processing and datatype communication in MPI implementations.
Grid computing has great potential but to enter the mainstream it must be simplified. Tools and libraries must make it easier to solve problems by being simpler and at the same time more sophisticated. We describe how...
详细信息
Grid computing has great potential but to enter the mainstream it must be simplified. Tools and libraries must make it easier to solve problems by being simpler and at the same time more sophisticated. We describe how grid computing can be achieved through spreadsheets. No parallel programming or complex tools need to be used. So long as dependencies allow it, formulae in a spreadsheet can be evaluated concurrently on the grid. Thus, grid computing becomes accessible to all those who can use a spreadsheet. The story is completed with a sophisticated backend system, NetSolve, which can solve complex linear algebra systems with minimal intervention from the user. We present the architecture of the system for performing such simple yet sophisticated grid computing and a case study which performs a large singular value decomposition.
This paper reports on the design, implementation and performance evaluation of a suite of GridRPC programming middleware called Ninf-G Version 2 (Ninf-G2). Ninf-G2 is a reference implementation of the GridRPC API, a p...
详细信息
This paper reports on the design, implementation and performance evaluation of a suite of GridRPC programming middleware called Ninf-G Version 2 (Ninf-G2). Ninf-G2 is a reference implementation of the GridRPC API, a proposed GGF standard. Ninf-G2 has been designed so that it provides 1) high performance in a large-scale computational Grid, 2) the rich functionalities which are required to adapt to compensate for the heterogeneity and unreliability of a Grid environment, and 3) an API which supports easy development and execution of Grid applications. Ninf-G2 is implemented to work with basic Grid services, such as GSI, GRAM, and MDS in the Globus Toolkit version 2. The performance ofNinf-G2 was evaluated using a weather forecasting system which was developed using Ninf-G2. The experimental results indicate that high performance can be attained even in relatively fine-grained task-parallel applications on hundreds of processors in a Grid environment.
parallel/distributed application development is a very difficult task for non-expert programmers, and therefore support tools are needed for all phases of this kind of application development cycle. This means that de...
详细信息
parallel/distributed application development is a very difficult task for non-expert programmers, and therefore support tools are needed for all phases of this kind of application development cycle. This means that developing applications using predefined programming structures (frameworks) should be easier than doing it from scratch. We propose to take advantage of the knowledge about the structure of the application in order to develop a dynamic and automatic tuning tool. In this sense, we have designed POETRIES, which is a dynamic performance tuning tool based on the idea that a performance model could be associated to the high-level structure of the application. This way, the tool could efficiently make better tuning decisions. Specifically, we focus this work on the definition of the performance model associated to applications developed with the master-worker framework.
Processor layout and data distribution are important to performance-oriented parallel computation, yet high-level language support that helps programmers address these issues is often inadequate. This paper presents a...
详细信息
Processor layout and data distribution are important to performance-oriented parallel computation, yet high-level language support that helps programmers address these issues is often inadequate. This paper presents a trio of abstract high-level language constructs - grids, distributions, and regions - that let programmers manipulate processor layout and data distribution. Grids abstract processor sets, regions abstract index sets, and distributions abstract mappings from index sets to processor sets; each of these is a first-class concept, supporting dynamic data reallocation and redistribution as well as dynamic manipulation of the processor set. This paper illustrates uses of these constructs in the solutions to several motivating parallel programming problems.
In this paper, we present "rules of thumb" for the efficient and straight-forward parallelization of cellular neural networks (CNNs) processing image data on cluster architectures. The rules result from the ...
详细信息
In this paper, we present "rules of thumb" for the efficient and straight-forward parallelization of cellular neural networks (CNNs) processing image data on cluster architectures. The rules result from the application and optimization of the simple but effective structural data parallel approach, which is based on the SPMD model. Digital gray-scale images were used to evaluate the optimized parallel cellular neural network program. The process of parallelizing the algorithm employs HPF to generate an MPI-based program.
In this paper, we propose the new method for the parallel system design based on expanded the logical coloured Petri net (LCPN). An LCPN is an extended Petri net that solves the problem of system description in previo...
详细信息
In this paper, we propose the new method for the parallel system design based on expanded the logical coloured Petri net (LCPN). An LCPN is an extended Petri net that solves the problem of system description in previously proposed place/transition nets and coloured Petri nets. This extension of Petri nets is suitable for designing complex control systems and for discussing methods of evaluating such systems realistically. In order to study the behaviour of the server system modelled with this net we simulated a Java program. This program confirmed that this extended Petri net is an effective tool for modelling the parallel system.
In this paper we examine how a network processor can be modeled using object-oriented techniques. We examine the Intel IXP 1200 network processor and discuss how the object-oriented language POOSL was utilized to allo...
详细信息
In this paper we examine how a network processor can be modeled using object-oriented techniques. We examine the Intel IXP 1200 network processor and discuss how the object-oriented language POOSL was utilized to allow an evaluation of a system before implementing it with hardware and software components. With the case study of the IXP 1200, we illustrate the suitability of object-oriented languages for system level modeling and design exploration.
The Cray X1 supercomputer is a distributed shared memory vector multiprocessor, scalable to 4096 processors and up to 65 terabytes of memory. The X1's hierarchical design uses the basic building block of the multi...
详细信息
The Cray X1 supercomputer is a distributed shared memory vector multiprocessor, scalable to 4096 processors and up to 65 terabytes of memory. The X1's hierarchical design uses the basic building block of the multi-streaming processor (MSP), which is capable of 12.8 GF/s for 64-bit operations. The distributed shared memory (DSM) of the X1 presents a 64-bit global address space that is directly addressable from every MSP with an interconnect bandwidth per computation rate of one byte per floating point operation. Our results show that this high bandwidth and low latency for remote memory accesses translates into improved application performance on important applications, such as an Eulerian gyrokinetic-Maxwell solver. Furthermore, this architecture naturally supports programming models like the Cray shmem API, Unified parallel C (UPC), and coarray FORTRAN (CAF), and it is imperative to select the appropriate models to exploit these features as our benchmarks demonstrate.
暂无评论