In this paper we present a runtime library design based on the two-phase collective I/O technique for irregular applications. The design is motivated by the requirements of a large number of ASCI (Accelerated Strategi...
详细信息
ISBN:
(纸本)0818682272
In this paper we present a runtime library design based on the two-phase collective I/O technique for irregular applications. The design is motivated by the requirements of a large number of ASCI (Accelerated Strategic computing Initiative) applications, although the design and interface is general enough to be used from any irregular applications. We present two designs, namely, "Collective I/O" and "Pipelined Collective I/O". In the first scheme, all processors participate in the I/O at the same time, making scheduling of I/O requests simpler but creating a possibility of contention at the I/O nodes. In the second approach, processors are grouped into several groups, SO that only one group performs I/O simultaneously, while the nest group performs communication to rearrange data, and this entire process is pipelined. This reduces the contention at the I/O nodes but requires more complicated scheduling and a possible degradation in communication performance. We obtained up to 40 MBytes/sec. application level performance on the Caltech's Intel Paragon (with 16 IO nodes, each containing one disk) which includes on-the-fly reordering costs. We observed up to 60 MBytes/sec on the ASCI/Red machine with only three I/O nodes (with RAIDS).
Dramatic advances in DNA sequencing technology have made it possible to study microbial environments by direct sequencing of environmental DNA samples. Yet, due to the huge volume and high data complexity, current de ...
详细信息
ISBN:
(纸本)9781450337236
Dramatic advances in DNA sequencing technology have made it possible to study microbial environments by direct sequencing of environmental DNA samples. Yet, due to the huge volume and high data complexity, current de novo assemblers cannot handle large metagenomic datasets or fail to perform assembly with acceptable quality. This paper presents the first parallel solution for decomposing the metagenomic assembly problem without compromising the post-assembly quality. We transform this problem into that of finding weakly connected components in the de Bruijn graph. We propose a novel distributed memory algorithm to identify the connected subgraphs, and present strategies to minimize the communication volume. We demonstrate the scalability of our algorithm on a soil metagenome dataset with 1.8 billion reads. Our approach achieves a runtime of 22 minutes using 1280 Intel Xeon cores for a 421 GB uncompressed FASTQ dataset. Moreover, our solution is generalizable to finding connected components in arbitrary undirected graphs.
The aim of the paper is to introduce general techniques in order to optimize the parallel execution time of sorting on a distributed architectures with processors of various speeds. Such an application requires a part...
详细信息
ISBN:
(纸本)3540338098
The aim of the paper is to introduce general techniques in order to optimize the parallel execution time of sorting on a distributed architectures with processors of various speeds. Such an application requires a partitioning step. For uniformly related processors (processors speeds are related by a constant factor), we develop a constant time technique for mastering processor load and execution time in an heterogeneous environment and also a technique to deal with unknown cost functions. For non uniformly related processors, we use a technique based on dynamic programming. Most of the time, the solutions are in O(p) (p is the number of processors), independent of the problem size n. Consequently, there is a small overhead regarding the problem we deal with but it is inherently limited by the knowing of time complexity of the portion of code following the partitioning.
In this paper, we describe a new scheme for checkpointing parallel applications on message-passing scalable distributed memory systems. The novelty of our scheme is that a checkpointed application can be restored, fro...
详细信息
Process of improving the performance of a parallel and distributed system through a redistribution of load among the processors is called load balancing. If it is being done at run time then it treats as dynamic load ...
详细信息
ISBN:
(纸本)9781605583518
Process of improving the performance of a parallel and distributed system through a redistribution of load among the processors is called load balancing. If it is being done at run time then it treats as dynamic load balancing. In this paper we present the framework for obtaining a user optimal load balancing scheme in distributed system. We formulate the Dynamic load balancing problem in distributed systems as a noncooperative game among users. The system model considers various parameters and accordingly the load balancing algorithm is defined. For the proposed noncooperative load balancing game, we consider the structure of the Nash equilibrium. Based on this structure we derive a new distributed load balancing algorithm. Our focus is to define the load balancing problem and the scheme to overcome it, by using new area called game theory. Copyright 2009 ACM.
In distributedcomputing environment, an important factor affecting performance of parallel algorithm is communication bandwidth. A new parallel volume rendering algorithm is presented in this paper, based on Shear-Wr...
详细信息
ISBN:
(纸本)0819424285
In distributedcomputing environment, an important factor affecting performance of parallel algorithm is communication bandwidth. A new parallel volume rendering algorithm is presented in this paper, based on Shear-Wrap fractorization of the Viewing Transformation using Pipeline Framework of PCs. Taking full use of overlap of communication and computing, we overcome the bottleneck of communication. In the existed algorithms based on object partition, local rendering and image composition are divided into two serial processes. Communication hardly happens during local rendering. In the period of image compositing, however, communication is very busy, and even congested. Furthermore, there are a big synchronism overhead in this period. This paper well solves this drawback by making local rendering and image compositing concurrently through pipeline of PCs. We have experimented on a pipeline composing of 16 Pentiums. The result shows that, performance is not affected much by communication and system overhead is little compared to rendering time. This paper provides a new method for studying on low-price, high-efficiency and real-time volume rendering system.
The services are generally used in distributedcomputing platforms called "Grid Services" that combines the Grid technology with the web service concept. In the Grid computing environment, several services a...
详细信息
ISBN:
(纸本)9780889867307
The services are generally used in distributedcomputing platforms called "Grid Services" that combines the Grid technology with the web service concept. In the Grid computing environment, several services are crucial. When these services fail, the computing can be terminated. The service reliability is a requirement of the system. Thus, we proposed to a prototype of reliable service model based on resiliency replication that uses forward recovery concepts. We study on the model's performance and its reliability and constructed an experiment to prove our analysis with the measured value. The results presented that the model predicted the performance very close from the actual value.
Scientific computing applications with highly demanding data capacity and computation power drive a computing platform migration from shared memory machines to multi-core/multiprocessor computer clusters. However, ove...
详细信息
ISBN:
(纸本)9783540680819
Scientific computing applications with highly demanding data capacity and computation power drive a computing platform migration from shared memory machines to multi-core/multiprocessor computer clusters. However, overheads in coordinating operations across computing nodes could counteract the benefit of having extra machines. Furthermore, the hidden dependency in applications slows down the simulation over non-shared memory machines. This paper proposed a framework to utilize multi-core/multiprocessor clusters for distributed simulation. Among several coordination schemes, decentralized control approach has demonstrated its effectiveness in reducing the communication overheads. A speculative execution strategy is applied to exploit parallelism thoroughly and overcome strong data dependency. Performance analysis and experiments are provided to demonstrate the performance gains.
OWL and RDF/RDFS are ontological languages developed by the World Wide Web Consortium (W3), which have become a de facto standard for the, ontological descriptions in various domains. The evolution of these standards ...
详细信息
ISBN:
(纸本)9783540897361
OWL and RDF/RDFS are ontological languages developed by the World Wide Web Consortium (W3), which have become a de facto standard for the, ontological descriptions in various domains. The evolution of these standards was influenced by the numerous advances in the research of knowledge representation and reasoning. Although support for reasoning and standardized representation is the key benefit, of these technologies, there is a lack of existing test, frameworks, which would be capable of addressing many crucial aspects of the Semantic Web applications. In this paper we propose a methodology for automated testing of OWL reasoners based on the real-world ontologies. This specification covers both terminological and assertional reasoning as well as checking of the correctness of the answers. An open-source implementation of such framework is described and a study of initial result is provided. The tests cover an extensive set of reasoners and ontologies and provide a state-of-the-art insight, into the field of OWL reasoning.
This article presents a new distributed approach for generating all prime numbers up to a given limit. From Eratosthenes, who elaborated the first prime sieve (more than 2000 years ago), to the advances of the paralle...
详细信息
This article presents a new distributed approach for generating all prime numbers up to a given limit. From Eratosthenes, who elaborated the first prime sieve (more than 2000 years ago), to the advances of the parallel computers (which have permitted to reach large limits or to obtain the previous results in a shorter time), prime numbers generation still represents an attractive domain of research. Nowadays, prime numbers play a central role in cryptography and their interest has been increased by the very recent proof that primality testing is in P. In this work, we propose a new distributed algorithm which generates all prime numbers in a given finite interval [2,..., n], based on the wheel sieve. As far as we know, this paper designs the first fully distributed wheel sieve algorithm.
暂无评论