In the recent years streaming accelerators like GPUs have been pop-up as an effective step towards parallelcomputing. the wish-list for these devices span from having a support for thousands of small cores to a natur...
详细信息
In the recent years streaming accelerators like GPUs have been pop-up as an effective step towards parallelcomputing. the wish-list for these devices span from having a support for thousands of small cores to a nature very close to the general purpose computing. this makes the design space very vast for the future accelerators containing thousands of parallel streaming cores. this complicates to exercise a right choice of the architectural configuration for the next generation devices. However, accurate design space exploration tools developed for the massively parallel architectures can ease this task. the main objectives of this work are twofold. (i) We present a complete environment of a trace driven simulator named SArcs (Streaming Architectural Simulator) for the streaming accelerators. (ii) We use our simulation tool-chain for the design space explorations of the GPU like streaming architectures. Our design space explorations for different architectural aspects of a GPU like device a e with reference to a base line established for NVIDIA's Fermi architecture (GPU Tesla C2050). the explored aspects include the performation effects by the variations in the configurations of Streaming Multiprocessors Global Memory Bandwidth, Channles between SMs down to Memory Hierarchy and Cache Hierarchy. the explorations are performed using application kernels from Vector Reduction, 2D-Convolution. Matrix-Matrix Multiplication and 3D-Stencil. Results show that the configurations of the computational resources for the current Fermi GPU device can deliver higher performance with further improvement in the global memory bandwidth for the same device.
Cloud Robotics is an emerging field within robotics, currently covering various application domains and robot network paradigms. this paper provides a structured, systematic overview of the numerous definitions, conce...
详细信息
ISBN:
(纸本)9781479900626
Cloud Robotics is an emerging field within robotics, currently covering various application domains and robot network paradigms. this paper provides a structured, systematic overview of the numerous definitions, concepts and technologies linked to Cloud Robotics and cloud technologies in a broader sense. It also presents a roadmap for the near future, describing development trends and emerging application areas. Cloud Robotics may have a significant role in the future as an explicitly human-centered technology, capable of addressing the dire needs of our society.
the execution of parallelapplications, using grid computing, requires an environment that enables them to be executed, managed, scheduled and monitored. the execution environment must provide a processing model, cons...
详细信息
ISBN:
(纸本)9781467351652;9780769549149
the execution of parallelapplications, using grid computing, requires an environment that enables them to be executed, managed, scheduled and monitored. the execution environment must provide a processing model, consisting of programming and execution models, withthe objective appropriately exploiting grid computing characteristics. this paper proposes a parallel processing model, based on shared variables for grid computing, consisting of an execution model that is appropriate for the grid and a CPAR parallel language programming model. the environment is designed to execute parallelapplications in grid computing, where all the characteristics present in grid computing are transparent to users. the results show that this environment is an efficient solution for the execution of parallelapplications.
Withthe continuous development of GPUs, modern general-purpose computation on GPUs (GPGPUs) is providing growing parallelism to general programs besides graphics applications. However, for those programs that involve...
详细信息
ISBN:
(纸本)9780769548791
Withthe continuous development of GPUs, modern general-purpose computation on GPUs (GPGPUs) is providing growing parallelism to general programs besides graphics applications. However, for those programs that involve both CPU and GPU, the data transmission bandwidth between them may become bottleneck that prevents GPU from fully exploiting its parallelcomputing capacity. As to avoid the defect, we try to reduce the data transmission by keeping part of the computation tasks on the CPU side other than sending all the data over to the GPU and process there. In this way the computation is done on CPU and GPU in parallel, and therefore also reduces overall process time. In order to split the computation workload in a systematic approach, we try to divide the corresponding data into chunks of proper size. We experimented our data dividing and heterogeneous memory scheduling with 2 benchmarks. the matrix multiplication is more than 30% faster, and the k - means2D is nearly 10% faster, than running solely in GPU.
Finite difference time domain (FDTD) method is a robust and accurate algorithm which is widely used in computational electromagnetic field and the simulation of optical phenomenon. In this paper, parallel FDTD based o...
详细信息
ISBN:
(纸本)9780769548791
Finite difference time domain (FDTD) method is a robust and accurate algorithm which is widely used in computational electromagnetic field and the simulation of optical phenomenon. In this paper, parallel FDTD based on overlapped domain decomposition is used to simulate the band gap of photonic crystals and the quantum efficiency of thin-film solar cells. the light-trapping effect is also analyzed by parallel FDTD, it's very important to improve light absorption. Numerical result demonstrates that the accuracy and the speedup of parallel FDTD are very high for large scale problem.
A class of simple network topologies is proposed and analyzed in this paper as part of the research on a polymorphous array architecture for graphics and image processing. the topologies are extended from the mesh net...
详细信息
ISBN:
(纸本)9780769548791
A class of simple network topologies is proposed and analyzed in this paper as part of the research on a polymorphous array architecture for graphics and image processing. the topologies are extended from the mesh network topology and are amenable to VLSI implementation. Simulation and theoretical analyses show that these topologies have many advantages over the mesh and Xmesh topologies. Routing algorithms are also proposed and analyzed for these new network topologies.
the conventional unified parallel computation model becomes more and more complicated which has weak pertinence and little guidance for each parallelcomputing phase. therefore, a general layered and heterogeneous ide...
详细信息
ISBN:
(纸本)9780769548791
the conventional unified parallel computation model becomes more and more complicated which has weak pertinence and little guidance for each parallelcomputing phase. therefore, a general layered and heterogeneous idea for parallel computation model research was proposed in this paper. the general layered heterogeneous parallel computation model was composed of parallel algorithm design model, parallel programming model, parallel execution model, and each model correspond to the three computing phases respectively. the properties of each model were described and research spots were also given. In parallel algorithm design model, an advanced language was designed for algorithm designers, and the corresponding interpretation system which based on text scanning was proposed to map the advanced language to machine language that runs on the heterogeneous software and hardware architectures. the parallel method library and parameter library were also provided to achieve the comprehensive utilization of the different computing resources and assign parallel tasks reasonably. theoretical analysis results show that the general layered heterogeneous parallel computation model is clear and single goaled for each parallelcomputing phase.
In this paper, scalable solver for time domain electromagnetic simulations will be stdudied. this solver is developed with discontinuous Galerkin (DG) method on unstructured mesh. High-order nodal basis, employing mul...
详细信息
ISBN:
(纸本)9780769548791
In this paper, scalable solver for time domain electromagnetic simulations will be stdudied. this solver is developed with discontinuous Galerkin (DG) method on unstructured mesh. High-order nodal basis, employing multivariate Lagrange polynomials defined on the triangles has been used to expand the electromagnetic fields. Both periodic and perfect electric conduct (PEC) conditions have been implemented. Domain decomposition has been used for the parallelization, and good scalability has been achieved due to the DG algorithm. Benchmarks for this solver on different supercomputers will be shown. At last, large simulations with1024 processors will be shown.
the AS4DR (Adaptive Scheduling for distributed Resources) scheduling method experimented in this paper aims at maximizing the CPU use efficiency when executing divisible load applications on heterogeneous distributed ...
详细信息
ISBN:
(纸本)9780769548791
the AS4DR (Adaptive Scheduling for distributed Resources) scheduling method experimented in this paper aims at maximizing the CPU use efficiency when executing divisible load applications on heterogeneous distributed memory platforms. AS4DR adapts the scheduling to: the unawareness of the total workload, boththe unspecification and the variation over time of the execution parameters (available communication speed, available computing speed, etc.). this paper presents the first experimental assessments of the adaptivity of the scheduling withthis method.
Fault tolerance issues related to the implementation of distributed iterative algorithms via the P2PDC peer-to-peer distributedcomputing environment are considered. P2PDC is a decentralized environment dedicated to t...
详细信息
ISBN:
(纸本)9781467351652;9780769549149
Fault tolerance issues related to the implementation of distributed iterative algorithms via the P2PDC peer-to-peer distributedcomputing environment are considered. P2PDC is a decentralized environment dedicated to task parallelapplications. It has been designed more particularly for the solution of large scale numerical simulation problems via distributed iterative algorithms. the environment allows frequent and direct communications between peers i.e., machines. P2PDC is based on P2PSAP, a self-adaptive communication protocol. We present new functionalities of P2PDC aimed at making our environment more robust. An adaptive fault tolerance mechanism ensures the robustness of computation to cope with peer faults. We consider also fault tolerance from an algorithmic point of view: we concentrate in particular on distributed asynchronous iterative algorithms that can tolerate some message loss. A series of computational results is presented and analyzed for a numerical simulation problem.
暂无评论