the X4CP32 is a parallel/reconfigurable microprocessor with 2 programming levels. Although it is a general-purpose microprocessor, it has the reliable performance of a reconfigurable architecture. this paper exposes i...
详细信息
ISBN:
(纸本)0769520464
the X4CP32 is a parallel/reconfigurable microprocessor with 2 programming levels. Although it is a general-purpose microprocessor, it has the reliable performance of a reconfigurable architecture. this paper exposes its architecture and programming levels, and discusses the powerful interaction between parallel programming and reconfiguration. It shows two performance-optimized implementations of matrix multiplication using both parallel and reconfigurable paradigms and a parallel implementation of miner intelligent agents.
Large-scale online applications such as Massively Multiplayer Online Games (MMOGs) require large amount of computing resources that support many players interacting simultaneously. Cluster computing is the technology ...
详细信息
ISBN:
(纸本)9781424412501
Large-scale online applications such as Massively Multiplayer Online Games (MMOGs) require large amount of computing resources that support many players interacting simultaneously. Cluster computing is the technology mostly used by online game designing firms. Cluster computing is limited by the number and types of computers it can manage, but these computers are usually in the same geographical location. On the other hand, Grid computing offers large-scale highperformance distributed computing which connects various types of computing resources on the Internet. In this paper, we design a Grid computing platform called the Massively Multi-user Online Platform (MMOP). the objectives of this proposed design are to offer scalability, flexibility, and simplicity to the development processes of distributed applications. MMOP allows executions of applications based on specified policy rules with dynamic addition of computing resources at run-time. Each application is managed separately, and multiple. large-scale applications can share a single computingarchitecture. An online game has been built to test the functional behavior of the MMOP. From the simulation results, the MMOP has demonstrated as a highperformance and scalable computingarchitecture.
Desktop grids have evolved to combine Peer-to-Peer and Grid computing techniques to improve the robustness, reliability and scalability of job execution infrastructures. However, efficiently matching incoming jobs to ...
详细信息
ISBN:
(纸本)1595936734
Desktop grids have evolved to combine Peer-to-Peer and Grid computing techniques to improve the robustness, reliability and scalability of job execution infrastructures. However, efficiently matching incoming jobs to available system resources and achieving good load balance in a fully decentralized and heterogeneous computing environment is a challenging problem. In this paper, we extend our prior work with a new decentralized algorithm for maintaining approximate global load information, and a job pushing mechanism that uses the global information to push jobs towards underutilized portions of the system. the resulting system more effectively balances load and improves overall system throughput. through a comparative analysis of experimental results across different system configurations and job profiles, performed via simulation, we show that our system can reliably execute Grid applications on a distributed set of resources both with low cost and with good load balance. Copyright 2007 ACM.
computer simulation is, in our days, one of the most important tools for the correct understanding of physical phenomena. In this work we will analise the improvement of performance by the parallelization of an algori...
详细信息
ISBN:
(纸本)0769517722
computer simulation is, in our days, one of the most important tools for the correct understanding of physical phenomena. In this work we will analise the improvement of performance by the parallelization of an algorithm used to simulate electronic properties from semiconductor systems.
the work presented in this paper consists on a tool developed to help the process of prototyping a TINA system. this tool is responsible for generating Java code automatically for a general TINA system, whose objects ...
详细信息
ISBN:
(纸本)0769517722
the work presented in this paper consists on a tool developed to help the process of prototyping a TINA system. this tool is responsible for generating Java code automatically for a general TINA system, whose objects were previously described by the use of SDL language. the generated code is a distributed system that makes use of CORBA as the distributed environment and is completely functional.
In this paper we consider the problem of programming for heterogeneous computer systems consist of CPUs and various accelerating devices such as GPUs. We introduce a few of the most popular models for heterogeneous pa...
详细信息
ISBN:
(纸本)9781538621622
In this paper we consider the problem of programming for heterogeneous computer systems consist of CPUs and various accelerating devices such as GPUs. We introduce a few of the most popular models for heterogeneous parallel programming, including OpenCL (Open computing Language), CUDA (Compute Unified Device architecture), OpenACC, OpenHMPP (Hybrid Multicore Parallel Programming), C++ AMP (accelerated massive parallelism), HPL (Heterogeneous Programming Library), etc.
Sparse triangular solver (SpTRSV) is an important and indispensable building block for many scientific applications. the parallelism of SpTRSV is exploited using Level-Set method in literature, however this method sti...
详细信息
ISBN:
(纸本)9781538666142
Sparse triangular solver (SpTRSV) is an important and indispensable building block for many scientific applications. the parallelism of SpTRSV is exploited using Level-Set method in literature, however this method still suffers from high synchronization cost and irregular global memory access especially on many-core architecture such as Sunway. In this paper, we propose an efficient implementation of SpTRSV using the massive computing resources on Sunway architecture. Specifically, we divide the 64 CPEs in a core group into three different roles, worker, router and storer. We also build a logical shared memory by carefully manipulating the scratchpad memory located in each storer and allow synchronization using the unique register communication on Sunway architecture. We partition the sparse matrix into multiple bands and replace the irregular global memory accesses with shared memory accesses, which significantly improves the data locality during the calculation of a band. Our experiments with 12 representative datasets demonstrate that our approach achieves up to 5.14x (2.65x on average) speedup.
Clustering plays an essential role in large-volume data analysis areas such as bioinformatics, statistic, pattern recognition and so on. K-means is one of most effective clustering algorithms, which is relatively easy...
详细信息
ISBN:
(纸本)9781538637906
Clustering plays an essential role in large-volume data analysis areas such as bioinformatics, statistic, pattern recognition and so on. K-means is one of most effective clustering algorithms, which is relatively easy to implement. Most real world applications usually involve a huge amount of data. thus, how to improve applications' efficiency while maintaining accuracy becomes a significant and considerable issue. In this paper, a K-means clustering algorithm, which uses heterogeneous parallel computing technology on computing processing elements and distributed computing technology, is proposed. this algorithm is applied in unique Sunway architecture based on "Sunway TaihuLight" Supercomputer-the world's fastest supercomputer with peak performance over 100PFLOPS. the testing results suggest that this improved algorithm is stable, fast and efficient. Conclusively, it has a great improvement in computation performance, especially with large volumes of data.
A fault tolerant computerarchitecture, FTCX, is an experimental computerarchitecture intended to serve as a general-purpose real-time computing system for fault sensitive supervisory and control applications. FTCX u...
详细信息
ISBN:
(纸本)0818607033
A fault tolerant computerarchitecture, FTCX, is an experimental computerarchitecture intended to serve as a general-purpose real-time computing system for fault sensitive supervisory and control applications. FTCX uses tightly synchronous triplex computation in its core to detect and mask all first faults. Synchronization, fault detection, and fault correction are all performed in the hardware. Novel to this architecture are the means by which interrupt requests and data are exchanged between the simplex local or remote industry standard bus (VMEbus) environments and the triplexed core environment. these exchanges are software transparent, yet fully implement all of the necessary algorithms to maintain data consistency and synchronization in the three channels of the core, even in the face of byzantine faults.
暂无评论