Event-driven programming has been a relatively hot topic in distributed systems development. Having worked on these systems for years, we now believe that it is not the best choice. Besides the well-known "stack ...
详细信息
As the wide application of multi-core processor architecture in the domain of high performance computing, fault tolerance for shared memory parallel programs becomes a hot spot of research. For years, checkpointing ha...
详细信息
Force directed approach is one of the most widely used methods in graph drawing research. However, the running time is increased intolerablely along with the enlargement of the graph size, which restricts the algorith...
详细信息
Force directed approach is one of the most widely used methods in graph drawing research. However, the running time is increased intolerablely along with the enlargement of the graph size, which restricts the algorithm's practicability. By the aid of GPU (graphics processing unit) computing platform, we can speed-up the graph layout with low cost, but the existing GPU implementation mainly employees an “one-by-one” style to update the vertex' coordination per iteration, which has a lower convergent rate than the “batch” style which is instead used commonly in traditional CPU implementation. As a result, the aesthetics of graph layout would be decreased if the total running time is restricted. It is hard to achieve both a high speedup factor of GPU over CPU and a high convergent rate in existing GPU computing implementation. In order to solve this problem partially, this paper presents two new strategies to implement the large-scale graph layout on CPU+GPU heteromerous platform to accelerate the force directed layout for graph drawing problem. The numerical computation results show that our GPU implementation can dramatically improve the performance of force-direct layout and is 20 times on a NVIDIA GeForce 9800 GT GPU at 1.44 GHz faster than the one on single-CPU core of Intel Pentium 4 PC at 3.0 GHz for the graph layout with moderate size (typically 1000 vertices).
In this paper, we introduce a generic model to deal with the event matching problem of content-based publish/ subscribe systems over structured P2P overlays. In this model, we claim that there are three methods (event...
详细信息
Building distributed applications is difficult mostly because of concurrency management. Existing approaches primarily include events and threads. Researchers and developers have been debating for decades to prove whi...
详细信息
ISBN:
(纸本)9781424477548;9781424477555
Building distributed applications is difficult mostly because of concurrency management. Existing approaches primarily include events and threads. Researchers and developers have been debating for decades to prove which is superior. Although the conclusion is far from obvious, this long debate clearly shows that neither of them is perfect. One of the problems is that they are both complex and error-prone. Both events and threads need the programmers to explicitly manage concurrency, and we believe it is just the source of difficulties. In this paper, we propose a novel approach-automatic concurrency management by the runtime system. It dynamically analyzes the programs to discover potential concurrency opportunities; and it dynamically schedules the communication and the computation tasks, resulting in automatic concurrent execution. This approach is inspired by the instruction scheduling technologies used in modern microprocessors, which dynamically exploits instruction-level parallelism. However, hardware scheduling algorithms do not fit software in many aspects, thus we have to design a new scheme completely from scratch. automatic concurrency management is a runtime technique with no modification to the language, compiler or byte code, so it is good at backward compatibility. It is essentially a dynamic optimization for networking programs.
The efficiency of communication is a key factor to the performance of networking applications, and concurrent communication is an important approach to the efficiency of communication. However, many concurrency opport...
详细信息
The efficiency of communication is a key factor to the performance of networking applications, and concurrent communication is an important approach to the efficiency of communication. However, many concurrency opportunities are very difficult to exploit because they depend on some undeterministic conditions. If these conditions are highly predictable, speculative execution can be a very effective approach to cope with the uncertainties. Existing researches on speculation seldom target at networking systems, and none of them can handle the event-driven model that is very popular in such systems. In this paper, we propose Nexus, a novel speculation scheme that supports event-driven networking applications. Nexus analyzes the dependence relationship of events, and performs speculation according to the duality of events and threads. Evaluation on a prototype implementation of nexus shows that this approach can significantly reduces the time needed to complete an event-driven program.
In light of its powerful computing capacity and high energy efficiency, GPU (graphics processing unit) has become a focus in the research field of HPC (High Performance Computing). CPU-GPU heterogeneous parallel syste...
详细信息
In light of its powerful computing capacity and high energy efficiency, GPU (graphics processing unit) has become a focus in the research field of HPC (High Performance Computing). CPU-GPU heterogeneous parallel systems have become a new development trend of super-computer. However, the inherent unreliability of the GPU hardware deteriorates the reliability of super-computer. We have researched on the fault-tolerance(FT) technique for CPU-GPU heterogeneous parallel systems, and introduced a new checkpointing mechanism, i.e., the hierarchical application-level checkpointing, for such systems. The basic idea of this new checkpointing mechanism is checkpointing at two independent levels, i.e., CPU level and GPU level, to tolerate CPU and GPU faults respectively. Based on the idea, we have also designed and implemented a hierarchical application-level checkpointing tool ”HiAL-Ckpt”. Using this tool, programmers can insert two kinds of directives, i.e., CPU directives and GPU directives into a program, and the compiler will transform the directives into CPU or GPU checkpointing codes according to their nature. From the case study of SWIM, a test bench from spec2000 benchmark suite, we have demonstrated the validity of the hierarchical application-level checkpointing technique. The experimental results show that the falut-tolerance temporal cost of HiAL-Ckpt for SWIM is only 2.25%, compared with the executing time of SWIM without any FT work.
As one of the components in iVCE software platform, iVCE/M devotes to the performance improvement of the I/O-intensive and memory-intensive applications with efficient aggregation of distributed memory resources. To f...
详细信息
As one of the components in iVCE software platform, iVCE/M devotes to the performance improvement of the I/O-intensive and memory-intensive applications with efficient aggregation of distributed memory resources. To facilitate the deployment of iVCE/M, the data locating algorithm with balanced time and space cost, as well as the transparent interface for the legacy applications without code modification, are both significant in the implementation of iVCE/M. We propose the logarithmic search tree based client-side metadata structure to accelerate the data locating using moderate memory consumption, the implicit I/O redirection mechanism, and the implementation of iVCE/M based disk cache system. The experiments with cross domain emulation prove that the scheme is applicable to exploit the distributed memory resources for applications with small granularity I/O accesses.
OpenMP is a widely used parallel programming model on traditional multi-core processors. Generally, OpenMP is used to develop fine-grained parallelism through a multithread model. Stream programming model is a new kin...
详细信息
OpenMP is a widely used parallel programming model on traditional multi-core processors. Generally, OpenMP is used to develop fine-grained parallelism through a multithread model. Stream programming model is a new kind of parallel programming model for stream architectures. OpenMP bears a resemblance to the stream programming model at some level. The transformation between the two models has attracted much attention from the research community, since it is the foundation of porting programs between the two architectures. Most related researches focus on the efficiency of porting existing parallel programs to the new architectures such as GPUs. Very few of these studies, however, focus on the portative problem systematically, namely, what kind of parallel programs can be or should be transplanted into stream programs and mapped to run on the stream processors. In this paper, we study the mapping relationship of parallel mechanism in OpenMP to the stream programming model, and point out those parallel mechanisms in OpenMP that are infeasible or undesirable for stream programs. By analyzing two typical benchmarks, we draw the conclusion that a majority of scientific applications are suitable to be mapped to the stream programming model. Our conclusion effectively validates the idea of accelerating scientific applications with the stream processors.
In recent years, heterogeneous parallel system have become a focus research area in high performance computing field. Generally, in a heterogeneous parallel system, CPU provides the basic computing environment and spe...
详细信息
In recent years, heterogeneous parallel system have become a focus research area in high performance computing field. Generally, in a heterogeneous parallel system, CPU provides the basic computing environment and special purpose accelerator (GPU in this paper) provides high computing performance. However, the overall performance of the system is prone to be limited by the data communication between the CPU and the GPU. Data communication is typically used to synchronize the array on the CPU and the stream (in AMD's terminology) on the GPU. In many cases, programmers just add data synchronization for each GPU invoking independently. It is easy to program in this manner but much redundant communication may be introduced, which will dramatically degrade the overall performance. To alleviate this problem, based on the stream programming model, we propose a heuristic data communication schedule approach in this paper. By analyzing the state transition of stream/array data pair, relaxing the synchronization strategy conditionally and considering optimization for branch and loop control structure, our approach can significantly reduce the redundant data communication in most cases.
暂无评论