Building distributed applications is difficult mostly because of concurrency management. Existing approaches primarily include events and threads. Researchers and developers have been debating for decades to prove whi...
详细信息
ISBN:
(纸本)9781424477548;9781424477555
Building distributed applications is difficult mostly because of concurrency management. Existing approaches primarily include events and threads. Researchers and developers have been debating for decades to prove which is superior. Although the conclusion is far from obvious, this long debate clearly shows that neither of them is perfect. One of the problems is that they are both complex and error-prone. Both events and threads need the programmers to explicitly manage concurrency, and we believe it is just the source of difficulties. In this paper, we propose a novel approach-automatic concurrency management by the runtime system. It dynamically analyzes the programs to discover potential concurrency opportunities; and it dynamically schedules the communication and the computation tasks, resulting in automatic concurrent execution. This approach is inspired by the instruction scheduling technologies used in modern microprocessors, which dynamically exploits instruction-level parallelism. However, hardware scheduling algorithms do not fit software in many aspects, thus we have to design a new scheme completely from scratch. automatic concurrency management is a runtime technique with no modification to the language, compiler or byte code, so it is good at backward compatibility. It is essentially a dynamic optimization for networking programs.
In this paper, we introduce a generic model to deal with the event matching problem of content-based publish/ subscribe systems over structured P2P overlays. In this model, we claim that there are three methods (event...
详细信息
parallel programming is the future of computer science. Now days shift to parallelprocessing makes it even more useful. This research effort aims at helping parallelism education on real life target systems, using pr...
详细信息
parallel programming is the future of computer science. Now days shift to parallelprocessing makes it even more useful. This research effort aims at helping parallelism education on real life target systems, using production oriented software tools. On its beginning, a survey of software environments for parallel programming is presented. The surveyed software environments are categorized according to their main function. An identity is synthesized for each environment by software and project attributes. Based on it and a set of proper criteria there are selected two groups of tools, those of primary and those of secondary interest for this research. An analysis of functional characteristics is performed for both groups. From the first group an open source software environment is chosen as the basis platform that will be enriched with education oriented enhancements. The characteristics analysis is exploited for the proposal of a research and development framework. Its target is the support of parallel programming, on real life target systems, using production oriented software environments.
The efficiency of communication is a key factor to the performance of networking applications, and concurrent communication is an important approach to the efficiency of communication. However, many concurrency opport...
详细信息
The efficiency of communication is a key factor to the performance of networking applications, and concurrent communication is an important approach to the efficiency of communication. However, many concurrency opportunities are very difficult to exploit because they depend on some undeterministic conditions. If these conditions are highly predictable, speculative execution can be a very effective approach to cope with the uncertainties. Existing researches on speculation seldom target at networking systems, and none of them can handle the event-driven model that is very popular in such systems. In this paper, we propose Nexus, a novel speculation scheme that supports event-driven networking applications. Nexus analyzes the dependence relationship of events, and performs speculation according to the duality of events and threads. Evaluation on a prototype implementation of nexus shows that this approach can significantly reduces the time needed to complete an event-driven program.
In light of its powerful computing capacity and high energy efficiency, GPU (graphics processing unit) has become a focus in the research field of HPC (High Performance Computing). CPU-GPU heterogeneous parallel syste...
详细信息
In light of its powerful computing capacity and high energy efficiency, GPU (graphics processing unit) has become a focus in the research field of HPC (High Performance Computing). CPU-GPU heterogeneous parallel systems have become a new development trend of super-computer. However, the inherent unreliability of the GPU hardware deteriorates the reliability of super-computer. We have researched on the fault-tolerance(FT) technique for CPU-GPU heterogeneous parallel systems, and introduced a new checkpointing mechanism, i.e., the hierarchical application-level checkpointing, for such systems. The basic idea of this new checkpointing mechanism is checkpointing at two independent levels, i.e., CPU level and GPU level, to tolerate CPU and GPU faults respectively. Based on the idea, we have also designed and implemented a hierarchical application-level checkpointing tool ”HiAL-Ckpt”. Using this tool, programmers can insert two kinds of directives, i.e., CPU directives and GPU directives into a program, and the compiler will transform the directives into CPU or GPU checkpointing codes according to their nature. From the case study of SWIM, a test bench from spec2000 benchmark suite, we have demonstrated the validity of the hierarchical application-level checkpointing technique. The experimental results show that the falut-tolerance temporal cost of HiAL-Ckpt for SWIM is only 2.25%, compared with the executing time of SWIM without any FT work.
As one of the components in iVCE software platform, iVCE/M devotes to the performance improvement of the I/O-intensive and memory-intensive applications with efficient aggregation of distributed memory resources. To f...
详细信息
As one of the components in iVCE software platform, iVCE/M devotes to the performance improvement of the I/O-intensive and memory-intensive applications with efficient aggregation of distributed memory resources. To facilitate the deployment of iVCE/M, the data locating algorithm with balanced time and space cost, as well as the transparent interface for the legacy applications without code modification, are both significant in the implementation of iVCE/M. We propose the logarithmic search tree based client-side metadata structure to accelerate the data locating using moderate memory consumption, the implicit I/O redirection mechanism, and the implementation of iVCE/M based disk cache system. The experiments with cross domain emulation prove that the scheme is applicable to exploit the distributed memory resources for applications with small granularity I/O accesses.
Nowadays by improving the richness of prediction methods and accessing to the more information about systems behavior, the role of proactive strategies in developing more reliable and efficient systems becomes more cr...
Nowadays by improving the richness of prediction methods and accessing to the more information about systems behavior, the role of proactive strategies in developing more reliable and efficient systems becomes more crucial. However, the goals of prediction and the way that the results can be employed to upgrade the system are still topics which draw recent researchers' attention. In this work, we attempt to decrease Jobs wait time and failure rate by using the results of a job futurity predictor. For achieving this goal, a system component called JSM is proposed. JSM consults the predictor and employs a game theory based model in order to probably rejecting the jobs which are likely to fail. Furthermore, for avoiding from rejecting safety jobs mistakenly, JSM intelligently adopts its decisions with the systems situations. Experimental results state a significant reduction in jobs wait time and failure rate in comparison with other related work.
OpenMP is a widely used parallel programming model on traditional multi-core processors. Generally, OpenMP is used to develop fine-grained parallelism through a multithread model. Stream programming model is a new kin...
详细信息
OpenMP is a widely used parallel programming model on traditional multi-core processors. Generally, OpenMP is used to develop fine-grained parallelism through a multithread model. Stream programming model is a new kind of parallel programming model for stream architectures. OpenMP bears a resemblance to the stream programming model at some level. The transformation between the two models has attracted much attention from the research community, since it is the foundation of porting programs between the two architectures. Most related researches focus on the efficiency of porting existing parallel programs to the new architectures such as GPUs. Very few of these studies, however, focus on the portative problem systematically, namely, what kind of parallel programs can be or should be transplanted into stream programs and mapped to run on the stream processors. In this paper, we study the mapping relationship of parallel mechanism in OpenMP to the stream programming model, and point out those parallel mechanisms in OpenMP that are infeasible or undesirable for stream programs. By analyzing two typical benchmarks, we draw the conclusion that a majority of scientific applications are suitable to be mapped to the stream programming model. Our conclusion effectively validates the idea of accelerating scientific applications with the stream processors.
In recent years, heterogeneous parallel system have become a focus research area in high performance computing field. Generally, in a heterogeneous parallel system, CPU provides the basic computing environment and spe...
详细信息
In recent years, heterogeneous parallel system have become a focus research area in high performance computing field. Generally, in a heterogeneous parallel system, CPU provides the basic computing environment and special purpose accelerator (GPU in this paper) provides high computing performance. However, the overall performance of the system is prone to be limited by the data communication between the CPU and the GPU. Data communication is typically used to synchronize the array on the CPU and the stream (in AMD's terminology) on the GPU. In many cases, programmers just add data synchronization for each GPU invoking independently. It is easy to program in this manner but much redundant communication may be introduced, which will dramatically degrade the overall performance. To alleviate this problem, based on the stream programming model, we propose a heuristic data communication schedule approach in this paper. By analyzing the state transition of stream/array data pair, relaxing the synchronization strategy conditionally and considering optimization for branch and loop control structure, our approach can significantly reduce the redundant data communication in most cases.
Event-driven programming has been a relatively hot topic in distributed systems development. Having worked on these systems for years, we now believe that it is not the best choice. Besides the wellknown "stack r...
详细信息
Event-driven programming has been a relatively hot topic in distributed systems development. Having worked on these systems for years, we now believe that it is not the best choice. Besides the wellknown "stack ripping" problem, we argue that it greatly influences the composability of software modules. Preemptive threads are also short of composability because of data-races and locks. Lacking of composability can result in systems with little vitality. Cooperative threading (or coroutine), on the contrary, is almost free of this problem, so we advocate it as the primary concurrency model for most distributed systems.
暂无评论