Mining log pattern to analyze the faults in large scale distributed system is affected by the existence of redundant and ambiguous noisy error logs. While existing works try to compress logs in a coarse granularity fr...
详细信息
Mining log pattern to analyze the faults in large scale distributed system is affected by the existence of redundant and ambiguous noisy error logs. While existing works try to compress logs in a coarse granularity from temporal and spatial view to remove the redundancy, they fail to reserve those ambiguous logs that might truly relate to a fault, which misleads the fault characterizing result. By modeling error logs as time series and examining the similarity between trash error log template and target error log, the ambiguous error logs are kept and the affected patterns can be effectively removed. Experiments in a practical complex service-based storage show that up to 92% of the affected patterns can be filtered.
Using the graphics processing unit (GPU) to accelerate the general purpose computation has attracted much attention from both the academia and industry due to GPU's powerful computing capacity. Thus optimization o...
详细信息
Using the graphics processing unit (GPU) to accelerate the general purpose computation has attracted much attention from both the academia and industry due to GPU's powerful computing capacity. Thus optimization of GPU programs has become a popular research direction. In order to support the general purpose computing more efficiently, GPU has integrated the general data cache to replace the existing software-managed on-chip memory. Consequently, improving the usage of the data cache becomes of vital importance to improve the performance of the GPU programs. The foundation of cache locality optimizations is efficient analysis and prediction of the cache behavior. Unfortunately, existing cache miss analysis models are based on sequential programs and thus cannot be used to analyze the GPU programs directly. In this paper, based on the deep analysis of GPU's execution model, we propose, for the first time, a cache miss analysis model for the GPU programs. We divide the problem into two subproblems: stack distance profile analysis of single thread block and cache contention analysis of multiple thread blocks. The experimental results from nine typical application kernels in the scientific computing field illustrate that our method is efficient and can be used to guide the cache locality optimizations for the GPU programs.
This paper addresses the issue of fault recovery in transactional memory,and proposes a method of fault recovery based on parallel recomputing in transactional memory *** method utilizes the dataversioning mechanism o...
详细信息
This paper addresses the issue of fault recovery in transactional memory,and proposes a method of fault recovery based on parallel recomputing in transactional memory *** method utilizes the dataversioning mechanism of transactional memory system to avoid the extra cost of state saving,rolls back a single transaction to avoid wasting the computing time of the fault-free transactions,and adopts the parallel recomputing method to reduce the cost of fault *** paper applies this method to Open TM programs,and proposes the implementation method of parallel recomputing in Open *** last,this paper tests the performance of this method through a test *** experimental results show that,compared with the fault recovery method of rolling back a single transaction,the parallel recomputing method in transactional memory system can execute the fault recovery quickly and accurately and the method has a well scalability.
This work explores the feasibility to implement IEEE-754-2008 standard quadruple precision (Quad) elementary functions on recent FPGAs with plenty of embedded memories and DSP blocks. First, we analysis the implementa...
详细信息
This work explores the feasibility to implement IEEE-754-2008 standard quadruple precision (Quad) elementary functions on recent FPGAs with plenty of embedded memories and DSP blocks. First, we analysis the implementation algorithm of Quad elementary functions in detail. Then, we present a special-purpose Very Large Instruction Word (VLIW) architecture for Quad elementary function (QE-Processor). The proposed processor uses a unified hardware structure, equipped with multiple basic arithmetic units, to implement various Quad algebraic and transcendental functions, in which several tradeoffs between latency and resource usage are carefully planned to avoid unbalanced resource utilization. The performance is improved through the explicitly paralleltechnology of custom VLIW instruction. Finally, we create a prototype of QE-Processor into Xilinx Virtex-5 and Virtex-6 FPGA chips. The experimental results show that our design can guarantee that the percentage of correct rounding is more than 99.9%. Moreover, the FPGA implementation on Virtex-6 XC6VLX760-2FF1760 FPGA, running at 220 MHz, outperforms the parallel software approach based on OpenMP running on an Intel Xeon E5620 CPU at 2.40GHz by a factor of 13X-20X for special function applications in Boost library.
Satisfiability Modulo Theories (SMT) is an extension of SAT towards FOL. SMT solvers have proven highly scalable and efficient for problems based on some ground theorems. However, SMT problems involving quantifiers an...
详细信息
Satisfiability Modulo Theories (SMT) is an extension of SAT towards FOL. SMT solvers have proven highly scalable and efficient for problems based on some ground theorems. However, SMT problems involving quantifiers and combination of theorems is a long-standing challenge, which has been a major bottleneck of practical application of SMT solvers in some fields. We reveal a decidable fragment of FOL involving quantifiers, which could not be solved by SMT solvers such as Z3, CVC3, etc., and show how to convert them into model checking problems.
Cloud needs to have rapid and elastic resources supply capability, because of the fluctuant resources demand of end-users. Multi-scale resources elastic binding is an important method to provide cloud services with ra...
详细信息
Cloud needs to have rapid and elastic resources supply capability, because of the fluctuant resources demand of end-users. Multi-scale resources elastic binding is an important method to provide cloud services with rapid and elastic service capability. The most challenging problem in multi-scale resources elastic binding is how to predict the dynamic resource demand of end-users, and then decide when and to what extent multi-scale resources need elastic binding based on the prediction. In this paper, we present the prediction model based on RBF (Radial Basis Function) Network, which is used to predict end-users resource demand in advance. Compared with current prediction methods, it has faster prediction speed and higher prediction accuracy. Then we use traces data (the bandwidth demand of Web type of cloud services) collected from a real-world cloud provider: ChinaCache, as the training and testing data set to validate the method. Finally, we evaluate the predicted results using general prediction accuracy metrics. The results prove that the prediction model based on RBF network is able to resolve the decision problem in multi-scale resources elastic binding.
Nowadays, the scale of parallel computer systems is increasing, and simulation technology has become an important tool for performance prediction in the system development process. Task mapping approach is an importan...
详细信息
Nowadays, the scale of parallel computer systems is increasing, and simulation technology has become an important tool for performance prediction in the system development process. Task mapping approach is an important aspect affecting the performance of simulation. In this paper, in order to solve the task mapping problem in performance simulation, a task mapping algorithm based on simulated annealing is proposed, and we verified the correctness and effectiveness of the algorithm by experiments. Experimental results show that the algorithm has high efficiency, and can solve the large-scale problem with lower time cost.
As the burst increasing of created and demand on information and data, the efficient solution on storage management is highly required in the cloud storage systems. As an important component of management, storage all...
详细信息
As the burst increasing of created and demand on information and data, the efficient solution on storage management is highly required in the cloud storage systems. As an important component of management, storage allocation scheme aims to use a low redundancy and also to achieve a high reliability. However, the two aims are hard to be unified. Considering the practical situation of Cloud systems, we propose a systematic storage allocation scheme to touch them both. And we also study the impact of many factors to the data reliability.
The development of multi-core processor makes the parallelization of traditional sequential algorithms increasingly important. Meanwhile, transactional memory serves a good parallel programming model. This paper takes...
详细信息
The development of multi-core processor makes the parallelization of traditional sequential algorithms increasingly important. Meanwhile, transactional memory serves a good parallel programming model. This paper takes the advantage of software transactional memory to parallelize the Multi-Exit Asymmetric Adaboost algorithm for face detection. The parallel version is evaluated on three different implementations of software transactional memory. The experiment results show that the transactional memory based parallelization outperforms the traditional lock based approach. A speedup of nearly seven is achieved on a eight-core machine on an eight-core system.
暂无评论