The performance gap for high performance applications has been widening over time. High level program transformations are critical to improve applications' performance, many of which concern the determination of o...
详细信息
A data-driven method was proposed to realistically animate garments on human poses in reduced space. Firstly, a gradient based method was extended to generate motion sequences and garments were simulated on the sequen...
详细信息
A data-driven method was proposed to realistically animate garments on human poses in reduced space. Firstly, a gradient based method was extended to generate motion sequences and garments were simulated on the sequences as our training data. Based on the examples, the proposed method can fast output realistic garments on new poses. Our framework can be mainly divided into offline phase and online phase. During the offline phase, based on linear blend skinning(LBS), rigid bones and flex bones were estimated for human bodies and garments, respectively. Then, rigid bone weight maps on garment vertices were learned from examples. In the online phase, new human poses were treated as input to estimate rigid bone transformations. Then, both rigid bones and flex bones were used to drive garments to fit the new poses. Finally, a novel formulation was also proposed to efficiently deal with garment-body penetration. Experiments manifest that our method is fast and accurate. The intersection artifacts are fast removed and final garment results are quite realistic.
Building distributed applications is difficult mostly because of concurrency management. Existing approaches primarily include events and threads. Researchers and developers have been debating for decades to prove whi...
详细信息
Building distributed applications is difficult mostly because of concurrency management. Existing approaches primarily include events and threads. Researchers and developers have been debating for decades to prove which is superior. Although the conclusion is far from obvious, this long debate clearly shows that neither of them is perfect. One of the problems is that they are both complex and error-prone. Both events and threads need the programmers to explicitly manage concurrencies, and we believe it is just the source of difficulties. In this paper, we propose a novel approach—superscalar communication, in which concurrencies are automatically managed by the runtime system. It dynamically analyzes the programs to discover potential concurrency opportunities; and it dynamically schedules the communication and the computation tasks, resulting in automatic concurrent execution. This approach is inspired by the idea of superscalar technology in modern microprocessors, which dynamically exploits instruction-level parallelism. However, hardware superscalar algorithms do not fit software in many aspects, thus we have to design a new scheme completely from scratch. Superscalar communication is a runtime extension with no modification to the language, compiler or byte code, so it is good at backward compatibility. Superscalar communication is likely to begin a brand new research area in systems software, which is characterized by dynamic optimization for networking programs.
Software security is one of the most important aspects of software quality, but errors in programs are still inevitable. Despite the development of static and dynamic analysis, the useful but costly manual analysis is...
详细信息
Software security is one of the most important aspects of software quality, but errors in programs are still inevitable. Despite the development of static and dynamic analysis, the useful but costly manual analysis is still heavily used. In this paper, we propose a human computation game based software error detection mechanism. The purpose of our mechanism is to improve the effectiveness and efficiency of manual error detection. The framework includes four main steps: 1) get all suspicious error locations by static tools according to different security properties;2) slice the programs from each error location;3) construct games for each slice and show them to players;4) collect game information and generate the final error report. We evaluate our approach by instantiating it for detecting errors of buffer overflow. The results show that our technique is successful in practice.
In light of GPUs’ powerful floating-point operation capacity,heterogeneous parallel systems incorporating general purpose CPUs and GPUs have become a highlight in the research field of high performance computing(HPC)...
详细信息
In light of GPUs’ powerful floating-point operation capacity,heterogeneous parallel systems incorporating general purpose CPUs and GPUs have become a highlight in the research field of high performance computing(HPC).However,due to the complexity of programming on GPUs,porting a large number of existing scientific computing applications to the heterogeneous parallel systems remains a big *** OpenMP programming interface is widely adopted on multi-core CPUs in the field of scientific *** effectively inherit existing OpenMP applications and reduce the transplant cost,we extend OpenMP with a group of compiler directives,which explicitly divide tasks among the CPU and the GPU,and map time-consuming computing fragments to run on the GPU,thus dramatically simplifying the *** have designed and implemented MPtoStream,a compiler of the extended OpenMP for AMD’s stream processing *** experimental results show that programming with the extended directives deviates from programming with OpenMP by less than 11% modification and achieves significant speedup ranging from 3.1 to 17.3 on a heterogeneous system,incorporating an Intel Xeon E5405 CPU and an AMD FireStream 9250 GPU,over the execution on the Xeon CPU alone.
Heavy ion experiments were performed on D flip-flop(DFF) and TMR flip-flop(TMRFF) fabricated in a 65-nm bulk CMOS process. The experiment results show that TMRFF has about 92% decrease in SEU crosssection compared to ...
详细信息
Heavy ion experiments were performed on D flip-flop(DFF) and TMR flip-flop(TMRFF) fabricated in a 65-nm bulk CMOS process. The experiment results show that TMRFF has about 92% decrease in SEU crosssection compared to the standard DFF design in static test mode. In dynamic test mode, TMRFF shows much stronger frequency dependency than the DFF design, which reduces its advantage over DFF at higher operation frequency. At 160 MHz, the TMRFF is only 3.2× harder than the standard DFF. Such small improvement in the SEU performance of the TMR design may warrant reconsideration for its use in hardening design.
Power management has become one of the first-order considerations in high performance computing field. Many recent studies focus on optimizing the performance of a computer system within a given power budget. However,...
详细信息
ISBN:
(纸本)9780769545769
Power management has become one of the first-order considerations in high performance computing field. Many recent studies focus on optimizing the performance of a computer system within a given power budget. However, most existing solutions adopt fixed period control mechanism and are transparent to the running applications. Although the application-transparent control mechanism has relatively good portability, it exhibits low efficiency in accelerator-based heterogeneous parallel systems. In typical accelerator-based parallel systems, different processing units have largely different processing speeds and power consumption. Under a given power constraint, how to choose the processor to be slowed down and how to schedule a parallel task onto different processors for the maximum performance are different from those in homogeneous systems and have not been well studied. From the motivating example in this paper, we could find that in order to efficiently harness the heterogeneous parallelprocessing, one should not only perform dynamic voltage/frequency scaling (DVFS) to meet the power budget, but also tune the parallel task scheduling to adapt to the changes. In this paper, we propose a heterogeneity-aware peak power management, which extends existing application-transparent power controller with an application-aware power controller. Firstly, we theoretically analyze the conditions for the maximum performance given a power budget for heterogeneous systems. Based on this result, we provide a power-constrained parallel task partition algorithm, which coordinates parallel task partition and voltage scaling for heterogeneous processing units to achieve the optimal performance given a system power budget. Finally, we evaluate the proposed method on a typical CPU-GPU heterogeneous system, and validate the superiority of application-aware power controller over the existing method.
Force-directed approach is one of the most widely used methods in graph drawing research. There are two main problems with the traditional force-directed algorithms. First, there is no mature theory to ensure the conv...
详细信息
Force-directed approach is one of the most widely used methods in graph drawing research. There are two main problems with the traditional force-directed algorithms. First, there is no mature theory to ensure the convergence of iteration sequence used in the algorithm and further, it is hard to estimate the rate of convergence even if the convergence is satisfied. Second, the running time cost is increased intolerablely in drawing largescale graphs, and therefore the advantages of the force-directed approach are limited in practice. This paper is focused on these problems and presents a sufficient condition for ensuring the convergence of iterations. We then develop a practical heuristic algorithm for speeding up the iteration in force-directed approach using a successive over-relaxation (SOR) strategy. The results of computational tests on the several benchmark graph datasets used widely in graph drawing research show that our algorithm can dramatically improve the performance of force-directed approach by decreasing both the number of iterations and running time, and is 1.5 times faster than the latter on average.
The Internet-based Virtual Computing Environment (iVCE) provides on-demand aggregation and autonomic collaboration mechanisms to facilitate the utilization of autonomous and dynamic Internet resources. Load balancing ...
详细信息
Nowadays, the performance of large-scale parallel computer system improves continuously, and the system scale becomes extremely large. Performance prediction has become an important approach to guide system design, im...
详细信息
ISBN:
(纸本)9783642233234
Nowadays, the performance of large-scale parallel computer system improves continuously, and the system scale becomes extremely large. Performance prediction has become an important approach to guide system design, implementation and optimization. Simulation method is the most widely used performance prediction technology for large-scale parallel computer system. In this paper, after analyzing the extant problems, we proposed a novel execution-driven performance simulation technology based on process-switch. We designed a simulation framework named PS-SIM, and implemented a prototype system based on MPICH2. Finally, we verified the proposed approach by experiments. Experimental results show that the approach has high accuracy and simulation performance.
暂无评论