For parallel disk array systems, the parallelism among disks is the key factor influencing the performance and the scale of systems. Unfortunately, the parallelism of cached blocks is largely ignored by cache manageme...
详细信息
For parallel disk array systems, the parallelism among disks is the key factor influencing the performance and the scale of systems. Unfortunately, the parallelism of cached blocks is largely ignored by cache management schemes that focus on reducing the number of cache misses. Therefore, the performance of parallel disks array systems for workloads with a skew access pattern can be seriously degraded. To solve this problem, we propose a parallelism based Cache Replacement scheme (PCAR) for parallel disks array systems, which can exploit both of the inter-disks parallelism and the intra-disk spatial locality. We have implemented the prototype of PCAR algorithm in Linux 2.6.18. And, the experimental results show that PCAR outperforms DULO and LRU by up to 22.8% and 33.1% in terms of the average response time, and by up to 20% and 43.9% in terms of throughput.
Groundwater flow simulation has become one of the top international issues in new generation of environmental applications. When managing large-scale groundwater flow problems, the intensive computational ability and ...
详细信息
Groundwater flow simulation has become one of the top international issues in new generation of environmental applications. When managing large-scale groundwater flow problems, the intensive computational ability and large amounts of memory space required for modeling are the main bottlenecks for researchers. In order to solve three-dimensional large-scale groundwater flow problems more rapidly, the Open MP was adopted to parallelize the preconditioned conjugate gradient (PCG) algorithm in this paper. And this paper carried out a numerical experiment of the three-dimensional groundwater flow model on a computer with four cores. Based on the numerical experiment, it is found that the execution time of the original serial PCG program is about 1.74 to 2.86 times of the parallel PCG program executed with different number of threads. The experimental results also demonstrate that the PCG solver based on Open MP is an effective way for solving large-scale three-dimensional groundwater flow problem.
As XML is playing a crucial role in web services, databases, and document processing, efficient processing of XML queries has become an important issue. On the other hand, due to the increasing number of users, high t...
详细信息
As XML is playing a crucial role in web services, databases, and document processing, efficient processing of XML queries has become an important issue. On the other hand, due to the increasing number of users, high throughput of XML queries is also required to execute tens of thousands of queries in a short time. Given the great success of GPGPU (General-Purpose computations on the Graphics Processors), we propose a parallel XML query model based on GPU, which mainly consists of two efficient task distribution strategies, to improve the efficiency and throughput of XML queries. We have developed a parallel simplified XPath language using Compute Unified Device Architecture (CUDA) on GPU, and evaluate our model on a recent NVIDIA GPU in comparison with its counterpart on eight-core CPU. The experiment results show that our model achieves both higher throughput and efficiency than CPU-based XML query.
Emerging accelerating architectures, such as GPUs, have proved successful in providing significant performance gains to various application domains. This is done by exploiting data parallelism in existing algorithms. ...
详细信息
Emerging accelerating architectures, such as GPUs, have proved successful in providing significant performance gains to various application domains. This is done by exploiting data parallelism in existing algorithms. However, programming in a data-parallel fashion imposes extra burdens to programmers, who are used to writing sequential programs. New programming models and frameworks are needed to reach a balance between programmability, portability and performance. We start from stream processing domain and propose GStream, a general-purpose, scalable data streaming framework on GPUs. The contributions of GStream are as follows: (1) We provide powerful, yet concise language abstractions suitable to describe conventional algorithms as streaming problems. (2) We project these abstractions onto GPUs to fully exploit their inherent massive data-parallelism. (3) We demonstrate the viability of streaming on accelerators. Experiments show that the proposed framework provides flexibility, programmability and performance gains for various benchmarks from a collection of domains, including but not limited to data streaming, data parallel problems, numerical codes and text search. This work lays a foundation to our future work to develop more general data parallelprogramming models for many-core architectures.
Due to its applicability to numerous types of data, including telephone records, web documents, and click streams, the data stream model has recently attracted attention. For analysis of such data, it is crucial to pr...
详细信息
The proceedings contain 54 papers. The topics discussed include: an efficient video program delivery algorithm in tree networks;an efficient content delivery algorithm for intermittently connected mobile ad hoc networ...
ISBN:
(纸本)9780769543123
The proceedings contain 54 papers. The topics discussed include: an efficient video program delivery algorithm in tree networks;an efficient content delivery algorithm for intermittently connected mobile ad hoc networks;one-hop neighbor transmission coverage information based distributed algorithm for connected dominating set;a novel P2P identification algorithm based on genetic algorithm and particle swarm optimization;accelerating reconfiguration for degradable mesh-connected processor arrays;a novel approach for multilevel fixed outline floorplanning;scheduling multiple multithreaded applications on asymmetric and symmetric chip multiprocessors;a hybrid fault tolerance model for reliable scheduling of critical real-time applications on grid systems;GTFTTS: a generalized tit-for-tat based corporative game for temperature-aware task scheduling in multi-core systems;and a scheduling strategy on load balancing of virtual machine resources in cloud computing environment.
Recently we proposed occam-pi as a high-level language for programming massively parallel reconfigurable architectures. The design of occam-pi incorporates ideas from CSP and pi-calculus to facilitate expressing paral...
详细信息
Recently we proposed occam-pi as a high-level language for programming massively parallel reconfigurable architectures. The design of occam-pi incorporates ideas from CSP and pi-calculus to facilitate expressing parallelism and reconfigurability. The feasability of this approach was illustrated by building three occam-pi implementations of DCT executing on an Ambric. However, because DCT is a simple and well-studied algorithm it remained uncertain whether occam-pi would also be effective for programming novel, more complex algorithms. In this paper, we demonstrate the applicability of occam-pi for expressing various degrees of parallelism by implementing a significantly large case-study of focus criterion calculation in an auto focus algorithm on the Ambric architecture. Auto focus is a key component of synthetic aperture radar systems. Two implementations of focus criterion calculation were developed and evaluated on the basis of performance. The comparison of the performance results with a single threaded software implementation of the same algorithm show that the throughput of the two implementations are 11x and 23x higher than the sequential implementation despite a much lower (9x) clock frequency. The two designs are, respectively, 29x and 40x more energy efficient.
FFT is a widely used algorithm, of which parallelization is a very important topic. There were a lot of works for this field and many parallelalgorithms were published in several decades. In this paper, an algorithm ...
详细信息
暂无评论