Metacomputing so far has been done mostly in more or less static configurations. However, applications with dynamic irregular behavior are increasing in significance and the computing platforms more often are time-sha...
详细信息
ISBN:
(纸本)0769516262
Metacomputing so far has been done mostly in more or less static configurations. However, applications with dynamic irregular behavior are increasing in significance and the computing platforms more often are time-sharing environments with varying system load thus, possibilities for dynamic connection and dynamic workload migration are becoming important, this paper discusses an approach to perform asynchronous workload balancing using the standard parallel library MPI. MPI and threads typically live in more or less separated worlds and the thread extension of MPI-2 is mainly meant to exploit more efficiently per SMP node within a model which is still mostly SPMD. We have extended MPI by dynamic mechanisms to automatically balance workload on the basis of threads and dynamic status/resource monitoring. Our extended library TeMPI is designed to run with in a minimum version with MPICH and thus MPICH-G2 in the Globus grid environment.
the X4CP32 is a parallel/reconfigurable microprocessor with 2 programming levels. Although it is a general-purpose microprocessor, it has the reliable performance of a reconfigurable architecture. this paper exposes i...
详细信息
ISBN:
(纸本)0769520464
the X4CP32 is a parallel/reconfigurable microprocessor with 2 programming levels. Although it is a general-purpose microprocessor, it has the reliable performance of a reconfigurable architecture. this paper exposes its architecture and programming levels, and discusses the powerful interaction between parallel programming and reconfiguration. It shows two performance-optimized implementations of matrix multiplication using both parallel and reconfigurable paradigms and a parallel implementation of miner intelligent agents.
Large-scale online applications such as Massively Multiplayer Online Games (MMOGs) require large amount of computing resources that support many players interacting simultaneously. Cluster computing is the technology ...
详细信息
ISBN:
(纸本)9781424412501
Large-scale online applications such as Massively Multiplayer Online Games (MMOGs) require large amount of computing resources that support many players interacting simultaneously. Cluster computing is the technology mostly used by online game designing firms. Cluster computing is limited by the number and types of computers it can manage, but these computers are usually in the same geographical location. On the other hand, Grid computing offers large-scale highperformance distributed computing which connects various types of computing resources on the Internet. In this paper, we design a Grid computing platform called the Massively Multi-user Online Platform (MMOP). the objectives of this proposed design are to offer scalability, flexibility, and simplicity to the development processes of distributed applications. MMOP allows executions of applications based on specified policy rules with dynamic addition of computing resources at run-time. Each application is managed separately, and multiple. large-scale applications can share a single computingarchitecture. An online game has been built to test the functional behavior of the MMOP. From the simulation results, the MMOP has demonstrated as a highperformance and scalable computingarchitecture.
Desktop grids have evolved to combine Peer-to-Peer and Grid computing techniques to improve the robustness, reliability and scalability of job execution infrastructures. However, efficiently matching incoming jobs to ...
详细信息
ISBN:
(纸本)1595936734
Desktop grids have evolved to combine Peer-to-Peer and Grid computing techniques to improve the robustness, reliability and scalability of job execution infrastructures. However, efficiently matching incoming jobs to available system resources and achieving good load balance in a fully decentralized and heterogeneous computing environment is a challenging problem. In this paper, we extend our prior work with a new decentralized algorithm for maintaining approximate global load information, and a job pushing mechanism that uses the global information to push jobs towards underutilized portions of the system. the resulting system more effectively balances load and improves overall system throughput. through a comparative analysis of experimental results across different system configurations and job profiles, performed via simulation, we show that our system can reliably execute Grid applications on a distributed set of resources both with low cost and with good load balance. Copyright 2007 ACM.
computer simulation is, in our days, one of the most important tools for the correct understanding of physical phenomena. In this work we will analise the improvement of performance by the parallelization of an algori...
详细信息
ISBN:
(纸本)0769517722
computer simulation is, in our days, one of the most important tools for the correct understanding of physical phenomena. In this work we will analise the improvement of performance by the parallelization of an algorithm used to simulate electronic properties from semiconductor systems.
the work presented in this paper consists on a tool developed to help the process of prototyping a TINA system. this tool is responsible for generating Java code automatically for a general TINA system, whose objects ...
详细信息
ISBN:
(纸本)0769517722
the work presented in this paper consists on a tool developed to help the process of prototyping a TINA system. this tool is responsible for generating Java code automatically for a general TINA system, whose objects were previously described by the use of SDL language. the generated code is a distributed system that makes use of CORBA as the distributed environment and is completely functional.
In this paper we consider the problem of programming for heterogeneous computer systems consist of CPUs and various accelerating devices such as GPUs. We introduce a few of the most popular models for heterogeneous pa...
详细信息
ISBN:
(纸本)9781538621622
In this paper we consider the problem of programming for heterogeneous computer systems consist of CPUs and various accelerating devices such as GPUs. We introduce a few of the most popular models for heterogeneous parallel programming, including OpenCL (Open computing Language), CUDA (Compute Unified Device architecture), OpenACC, OpenHMPP (Hybrid Multicore Parallel Programming), C++ AMP (accelerated massive parallelism), HPL (Heterogeneous Programming Library), etc.
Sparse triangular solver (SpTRSV) is an important and indispensable building block for many scientific applications. the parallelism of SpTRSV is exploited using Level-Set method in literature, however this method sti...
详细信息
ISBN:
(纸本)9781538666142
Sparse triangular solver (SpTRSV) is an important and indispensable building block for many scientific applications. the parallelism of SpTRSV is exploited using Level-Set method in literature, however this method still suffers from high synchronization cost and irregular global memory access especially on many-core architecture such as Sunway. In this paper, we propose an efficient implementation of SpTRSV using the massive computing resources on Sunway architecture. Specifically, we divide the 64 CPEs in a core group into three different roles, worker, router and storer. We also build a logical shared memory by carefully manipulating the scratchpad memory located in each storer and allow synchronization using the unique register communication on Sunway architecture. We partition the sparse matrix into multiple bands and replace the irregular global memory accesses with shared memory accesses, which significantly improves the data locality during the calculation of a band. Our experiments with 12 representative datasets demonstrate that our approach achieves up to 5.14x (2.65x on average) speedup.
Clustering plays an essential role in large-volume data analysis areas such as bioinformatics, statistic, pattern recognition and so on. K-means is one of most effective clustering algorithms, which is relatively easy...
详细信息
ISBN:
(纸本)9781538637906
Clustering plays an essential role in large-volume data analysis areas such as bioinformatics, statistic, pattern recognition and so on. K-means is one of most effective clustering algorithms, which is relatively easy to implement. Most real world applications usually involve a huge amount of data. thus, how to improve applications' efficiency while maintaining accuracy becomes a significant and considerable issue. In this paper, a K-means clustering algorithm, which uses heterogeneous parallel computing technology on computing processing elements and distributed computing technology, is proposed. this algorithm is applied in unique Sunway architecture based on "Sunway TaihuLight" Supercomputer-the world's fastest supercomputer with peak performance over 100PFLOPS. the testing results suggest that this improved algorithm is stable, fast and efficient. Conclusively, it has a great improvement in computation performance, especially with large volumes of data.
暂无评论