An ongoing work is presented for accurately predicting the performance of distributedapplications in heterogeneous systems. We are developing dPerf, a tool built using the Rose framework for performing static analysi...
详细信息
ISBN:
(纸本)9780769543284
An ongoing work is presented for accurately predicting the performance of distributedapplications in heterogeneous systems. We are developing dPerf, a tool built using the Rose framework for performing static analysis and an automatic instrumentation on the input source code of programs written in C, C++ or Fortran. The accuracy in predicting program computation time resides in using hardware counters, as well as in applying two block benchmarking techniques that we propose in this paper. The current work makes use of a network simulator in order to calculate the communication time used in our approach. Afterwards, the computation and communication times are being summed up obtaining an estimation of the distributed application execution time. The approach is proven experimentally using NAS Integer Sort benchmark, the communications being simulated with SimGrid.
Sorting is a fundamental algorithm used extensively in computer science as an intermediate step in many applications. The performance of sorting algorithms is heavily influenced by the type of data being sorted, and t...
详细信息
ISBN:
(纸本)1932415262
Sorting is a fundamental algorithm used extensively in computer science as an intermediate step in many applications. The performance of sorting algorithms is heavily influenced by the type of data being sorted, and the machine being used. To assist in obtaining portable performance for sorting algorithms, we propose an install-time system for automatically constructing sequential and parallel sorts that are highly tuned for the target architecture. Our system has two steps: first a hybrid sequential divide-and-conquer sort is constructed and then this algorithm is parallelized using a shared work-queue model. To evaluate our system, we compare automatically generated sorting algorithms to sequential and parallel versions of the C++STL sort. The generated sorts are shown to be competitive with STL sort on sequential systems and to outperform the parallel STL sort on a 4 processor Xeon server.
Different parallelization methods vary in their system requirements, programming styles, efficiency of exploring parallelism, and the application characteristics they can handle. Different applications can exhibit tot...
详细信息
ISBN:
(纸本)0769516807
Different parallelization methods vary in their system requirements, programming styles, efficiency of exploring parallelism, and the application characteristics they can handle. Different applications can exhibit totally different performance gains depending on the parallelization method used. This paper compares OpenMP, MPI, and Strings(A distributed shared memory)for parallelizing a complicated tribology problem. The problem size and computing infrastructure are changed and their impacts on the parallelization methods are studied. All of the methods studied exhibit good performance improvements. This paper exhibits the benefits that are the result of applying parallelization techniques to applications in this field.
Improving the memory access behavior of parallelapplications is one of the most important challenges in high-performance computing. Non-Uniform Memory Access (NUMA) architectures pose particular challenges in this co...
详细信息
ISBN:
(纸本)9781467387767
Improving the memory access behavior of parallelapplications is one of the most important challenges in high-performance computing. Non-Uniform Memory Access (NUMA) architectures pose particular challenges in this context: they contain multiple memory controllers and the selection of a controller to serve a page request influences the overall locality and balance of memory accesses, which in turn affect performance. In this paper, we analyze and improve the memory access pattern and overall memory usage of large-scale irregular applications on NUMA machines. We selected HashSieve, a very important algorithm in the context of lattice-based cryptography, as a representative example, due to (1) its extremely irregular memory pattern, (2) large memory requirements and (3) unsuitability to other computer architectures, such as GPUs. We optimize HashSieve with a variety of techniques, focusing both on the algorithm itself as well as the mapping of memory pages to NUMA nodes, achieving a speedup of over 2x.
Debugging can help programmers to locate the reason for incorrect program behavior parallel programs' executions are much more complex than those of serial ones, which make it difficult to debug parallel programs....
详细信息
ISBN:
(纸本)1932415262
Debugging can help programmers to locate the reason for incorrect program behavior parallel programs' executions are much more complex than those of serial ones, which make it difficult to debug parallel programs. In contrast with traditional parallel and distributed computing environments, some new characteristics, such as largely heterogeneous and dynamic, security, and etc, appear in computational grids. These new features challenge debugging grid applications. In this paper we design and implement a grid-enabled parallel debugging environment to simply debug grid applications. We present the concept of ad hoc computing environment in grids and the method by which this environment can be built automatically constrained to MPI-G2 application. Some capabilities, including user identification, automatic task submission, resource registering and collection can be accomplished easily through the portal. The debugging functionalities include consistent global state and race detection.
The need to continue to work in a mobile environment raises the problem of data availability in the presence of disconnections. Our approach aiming at solving this problem is to make a local replication of data and co...
详细信息
ISBN:
(纸本)1892512459
The need to continue to work in a mobile environment raises the problem of data availability in the presence of disconnections. Our approach aiming at solving this problem is to make a local replication of data and code on the mobile terminal. The system and the applications should also be reactive to mobile environment changes. The work presented in this paper is the continuation of Domint [6, 7], a platform to cope with disconnections in mobile environments for CORBA-based applications. In this paper, we go further in the study of the role of disconnected entities by proposing meta-data to build patterns in order to (1) choose which entities of the distributed application must become disconnected, and (2) state whether disconnected entities are necessary for the execution while being disconnected We also outline the integration of these meta-data into the CORBA component-based architecture.
In this paper we introduce new parallel programming language parallel C#, the main feature of which is the combination of chords and higher-order functions in one language. This language extends the standard syntax of...
详细信息
ISBN:
(纸本)1601320841
In this paper we introduce new parallel programming language parallel C#, the main feature of which is the combination of chords and higher-order functions in one language. This language extends the standard syntax of C# language for the parallel programming needs and simplifies the task of writing complex multithreaded and distributedapplications. We describe the design of the language and give examples of its use in addressing a range of concurrent programming problems. Also we introduce new distributed Runtime Systems for this language both for Windows and Linux machines.
In this paper, we investigate the load balancing problem among a cluster of mobile and fixed devices in a voice enabled interface. We consider a design approach. The voice interface has to support up to a hundred simu...
详细信息
ISBN:
(纸本)1892512459
In this paper, we investigate the load balancing problem among a cluster of mobile and fixed devices in a voice enabled interface. We consider a design approach. The voice interface has to support up to a hundred simultaneous users. The load balancing criteria we consider are defined, on the one hand in terms of network, CPU and memory resources, and on the other hand in terms of the boundary between fixed and mobile devices. The solution we propose is based on a derivate of the Linux Virtual Server: the Piranha system. Piranha enhances the Linux Virtual Server with several features, in particular with monitoring aspects and fault tolerance. We describe precisely the proposed architecture of the application. We present a new scheduling technique which has been designed to take into account dynamically resource loads.
This paper presents a massively shared virtual reality system based on a network of peers. It does not rely on any server nor on IP multicast, and intends to be scalable to an unlimited number of participants. Followi...
详细信息
ISBN:
(纸本)1892512459
This paper presents a massively shared virtual reality system based on a network of peers. It does not rely on any server nor on IP multicast, and intends to be scalable to an unlimited number of participants. Following a peer-to-peer scheme, entities collaborate to build up a common virtual world. The behavior of entities, running algorithms in order to maintain local properties, ensures the consistency of the virtual world and the connexity of the network. The paper also describes how entities join the network and enter the virtual world at a particular position.
The cutting number of a node i in a connected graph G is the number of pairs of nodes in different components of G - {i}. The cutting center consists of the set of nodes of G with maximal cutting number. This paper pr...
详细信息
ISBN:
(纸本)1892512459
The cutting number of a node i in a connected graph G is the number of pairs of nodes in different components of G - {i}. The cutting center consists of the set of nodes of G with maximal cutting number. This paper presents a self-stabilizing algorithm for finding the cutting numbers for all nodes of a tree T = < V-T;E-T> and hence the cutting center of T It is shown that the proposed self-stabilizing algorithm requires O(n(2)) moves. The algorithm complexity can also be expressed as O(n) rounds.
暂无评论