In desktop grids the use of off-the-shelf shared components makes the use of dedicated resources economically nonviable and increases the complexity of design of efficient storage systems that are required to address ...
详细信息
ISBN:
(纸本)1595936734
In desktop grids the use of off-the-shelf shared components makes the use of dedicated resources economically nonviable and increases the complexity of design of efficient storage systems that are required to address the exponentially growing storage demands of modern applications that run on these platforms. To address this challenge, we present PeerStripe, a storage system that transparently distributes files to storage space contributed by participants that have joined a peer-to-peer (p2p) network. PeerStripe uses structured p2p routing to yield a scalable, robust, reliable, and self-organizing storage system. the novelty of PeerStripe lies in its ingenious use of striping and error coding techniques in a heterogeneous distributed environment to stor every large data files. Our evaluation of PeerStripe shows that it can achieve acceptable performance for applications in desktop grids.
We present and analyze techniques to efficiently solve the partial content distribution problem-distributing a logical data set to receivers which individually desire only subsets of the total data. this is a more gen...
详细信息
ISBN:
(纸本)1595936734
We present and analyze techniques to efficiently solve the partial content distribution problem-distributing a logical data set to receivers which individually desire only subsets of the total data. this is a more general and fundamentally different problem than traditional whole-file content distribution;providing new challenges and new optimization opportunities. It supports a wider variety of use models, e.g., striped file transfer, scatter/gather, or distributed editing. this work develops new metadata management and transfer scheduling techniques providing good results on highperformance networks. Distributed applicationsin such systems tend to have data requirements more complicated than just total overlap at every node: transfers desired differ dramatically from whole-file content distribution. Traditional approaches perform poorly in such cases. We provide empirical data exhibiting these limitations, evaluate a new BitTorrent-based implementation of our ideas, and show order of magnitude improvements in bandwidth and latency. Copyright 2007 ACM.
ISAM(1) is a proposal directed to resource management in heterogeneous networks, supporting physical and logical mobility, dynamic adaptation and the execution of distributed applications based on components. In order...
详细信息
ISBN:
(纸本)0769517722
ISAM(1) is a proposal directed to resource management in heterogeneous networks, supporting physical and logical mobility, dynamic adaptation and the execution of distributed applications based on components. In order to achieve its goals, ISAM uses, as strategy, an integrated enviromnent that: (a) provides a programming paradigm and its execution environment;(b) handles the adaptation process through a multilevel collaborative model, in which boththe system and the application contribute. In this paper we discuss the main mechanisms used to implement the ISAM features, and we also present a parallel application that explores some of this features.
In this paper, a novel workflow-aware distributed versioning file system, WAD-VFS is presented to overcome the shortcoming of traditional DFS and facilitate the highperformancecomputing. Our preliminary simulation r...
详细信息
ISBN:
(纸本)0780390377
In this paper, a novel workflow-aware distributed versioning file system, WAD-VFS is presented to overcome the shortcoming of traditional DFS and facilitate the highperformancecomputing. Our preliminary simulation results are impressive and can hence serve as a supporting evidence of deploying WAD-VFS to our ongoing metacomputing project, Trellis system.
Large-scale online applications such as Massively Multiplayer Online Games (MMOGs) require large amount of computing resources that support many players interacting simultaneously. Cluster computing is the technology ...
详细信息
ISBN:
(纸本)9781424412501
Large-scale online applications such as Massively Multiplayer Online Games (MMOGs) require large amount of computing resources that support many players interacting simultaneously. Cluster computing is the technology mostly used by online game designing firms. Cluster computing is limited by the number and types of computers it can manage, but these computers are usually in the same geographical location. On the other hand, Grid computing offers large-scale highperformance distributed computing which connects various types of computing resources on the Internet. In this paper, we design a Grid computing platform called the Massively Multi-user Online Platform (MMOP). the objectives of this proposed design are to offer scalability, flexibility, and simplicity to the development processes of distributed applications. MMOP allows executions of applications based on specified policy rules with dynamic addition of computing resources at run-time. Each application is managed separately, and multiple. large-scale applications can share a single computingarchitecture. An online game has been built to test the functional behavior of the MMOP. From the simulation results, the MMOP has demonstrated as a highperformance and scalable computingarchitecture.
We consider the implementation of a parallel Monte Carlo code for high-performance simulations on PC clusters with MPI. We carry out tests of speedup and efficiency. the code is used for numerical simulations of pure ...
详细信息
ISBN:
(纸本)0769520464
We consider the implementation of a parallel Monte Carlo code for high-performance simulations on PC clusters with MPI. We carry out tests of speedup and efficiency. the code is used for numerical simulations of pure SU (2) lattice gauge theory at very large lattice volumes, in order to study the infrared behavior of gluon and ghost propagators. this problem is directly related to the confinement of quarks and gluons in the physics of strong interactions.
Researchers are constantly looking for ways to improve the execution time of parallel applications on distributed systems. Although compile-time static scheduling heuristics employ complex mechanisms, the quality of t...
详细信息
ISBN:
(纸本)0769520464
Researchers are constantly looking for ways to improve the execution time of parallel applications on distributed systems. Although compile-time static scheduling heuristics employ complex mechanisms, the quality of their schedules are handicapped by estimated run-time costs. On the other hand, while dynamic schedulers use actual run-time costs, they have to be of low complexity in order to reduce the scheduling overhead this paper investigates the viability of integrating these two approaches into a hybrid scheduling framework. the relationship between static schedulers, dynamic heuristics and scheduling events are examined the results show that a hybrid scheduler can indeed improve the schedules produced by good traditional static list scheduling algorithms.
the peak compute performance of GPUs has been increased by integrating more compute resources and operating them at higher frequency. However, such approaches significantly increase power consumption of GPUs, limiting...
详细信息
ISBN:
(纸本)9781467355872
the peak compute performance of GPUs has been increased by integrating more compute resources and operating them at higher frequency. However, such approaches significantly increase power consumption of GPUs, limiting further performance increase due to the power constraint. Facing such a challenge, we propose three techniques to improve power efficiency and performance of GPUs in this paper. First, we observe that many GPGPU applications are integer-intensive. For such applications, we combine a pair of dependent integer instructions into a composite instruction that can be executed by an enhanced fused multiply-add unit. Second, we observe that computations for many instructions are duplicated across multiple threads. We dynamically detect such instructions and execute them in a separate scalar unit. Finally, we observe that 16 or fewer bits are sufficient for accurate representation of operands and results of many instructions. thus, we split the 32-bit datapath into two 16-bit datapath slices that can concurrently issue and execute up to two such instructions per cycle. All three proposed techniques can considerably increase utilization of compute resources, improving power efficiency and performance by 20% and 15%, respectively.
Desktop grids have evolved to combine Peer-to-Peer and Grid computing techniques to improve the robustness, reliability and scalability of job execution infrastructures. However, efficiently matching incoming jobs to ...
详细信息
ISBN:
(纸本)1595936734
Desktop grids have evolved to combine Peer-to-Peer and Grid computing techniques to improve the robustness, reliability and scalability of job execution infrastructures. However, efficiently matching incoming jobs to available system resources and achieving good load balance in a fully decentralized and heterogeneous computing environment is a challenging problem. In this paper, we extend our prior work with a new decentralized algorithm for maintaining approximate global load information, and a job pushing mechanism that uses the global information to push jobs towards underutilized portions of the system. the resulting system more effectively balances load and improves overall system throughput. through a comparative analysis of experimental results across different system configurations and job profiles, performed via simulation, we show that our system can reliably execute Grid applications on a distributed set of resources both with low cost and with good load balance. Copyright 2007 ACM.
Matrix multiplication is a widely-used routine in science and engineering applications. Accelerating this routine is important, because applications with large-scale matrix multiplication are increasingly common, espe...
详细信息
ISBN:
(纸本)9781538637906
Matrix multiplication is a widely-used routine in science and engineering applications. Accelerating this routine is important, because applications with large-scale matrix multiplication are increasingly common, especially in the area of high-performancecomputing (HPC). However, existing computing platforms including CPU, GPGPU and FPGA suffer from unsatisfactory performance or efficiency for this routine. In this paper, we propose a high-performance accelerator for double-precision floating-point matrix multiplication, and build a performance model for design space exploration based on a memory access scheduling. Impact of architecture parameters on accelerator performance and efficiency are evaluated and analyzed. Experimental results show that our proposed accelerator with 256 processing elements (PEs) can achieve a maximum performance of 767.99 GFLOPS and an efficiency of 99.99% for large-scale matrix multiplication, which is well suited to the requirement of HPC applications.
暂无评论