Nowadays, the scale of parallel computer systems is increasing, and simulation technology has become an important tool for performance prediction in the system development process. Task mapping approach is an importan...
详细信息
Nowadays, the scale of parallel computer systems is increasing, and simulation technology has become an important tool for performance prediction in the system development process. Task mapping approach is an important aspect affecting the performance of simulation. In this paper, in order to solve the task mapping problem in performance simulation, a task mapping algorithm based on simulated annealing is proposed, and we verified the correctness and effectiveness of the algorithm by experiments. Experimental results show that the algorithm has high efficiency, and can solve the large-scale problem with lower time cost.
As the burst increasing of created and demand on information and data, the efficient solution on storage management is highly required in the cloud storage systems. As an important component of management, storage all...
详细信息
As the burst increasing of created and demand on information and data, the efficient solution on storage management is highly required in the cloud storage systems. As an important component of management, storage allocation scheme aims to use a low redundancy and also to achieve a high reliability. However, the two aims are hard to be unified. Considering the practical situation of Cloud systems, we propose a systematic storage allocation scheme to touch them both. And we also study the impact of many factors to the data reliability.
The application of memristor in building hardware neural network has accepted widespread interests, and may bring novel opportunities to neural computing. However, due to the limitation of programming precision, the c...
详细信息
The application of memristor in building hardware neural network has accepted widespread interests, and may bring novel opportunities to neural computing. However, due to the limitation of programming precision, the conductance of memristor which represents stored information may deviate from theoretical value, and thus bring error to the neural computing results. In this paper, we analyze the impact of imprecise programming on building hardeware neural network through Monte Carlo simulation on feedback layer model. The results show that the fault-tolerance ability of neural network could well adapt to these errors, which further proves the potential of building neural networks using memristors.
Network coding brings a new solution for IP congestion control, since more than one buffered packets can be encoded together and removed as a coded packet. This may significantly decrease the packet loss during the co...
详细信息
Network coding brings a new solution for IP congestion control, since more than one buffered packets can be encoded together and removed as a coded packet. This may significantly decrease the packet loss during the congestion, but at the cost of building redundant paths. However, how to minimize the overhead of redundant paths turns out to be a NP-hard problem. In this paper, we propose a novel approximation algorithm called FlowGrouping, which transforms the redundant paths building problem into a limited clique partition problem by increasing edge weights, and can find a good approximate solution within O(n 3 ) computation time.
This paper addresses the issue of error detection in transactional memory, and proposes a new method of error detection based on redundant transaction (EDRT). This method creates a transaction copy for every transacti...
详细信息
This paper addresses the issue of error detection in transactional memory, and proposes a new method of error detection based on redundant transaction (EDRT). This method creates a transaction copy for every transaction, and executes both original transactions and transaction copies on adequate processor cores, and achieves error detection by comparing the execution results. EDRT utilizes the data-versioning mechanism of transactional memory to achieve the acquisition of an approximate minimum error detection comparing data set, and the acquisition is transparent and online. At last, this paper validates the EDRT through 5 test programs, including 4 SPLASH-2 benchmarks. The experimental results show that, the average error detecting cost is about 3.68% relative to the whole program, and it's only about 12.07% relative to the transaction parts of the program.
Internet fundamentally changes the model of software development, the demands of software quality, and the process of software resource sharing. Internet- based environment for trustworthy software production is recog...
详细信息
Internet fundamentally changes the model of software development, the demands of software quality, and the process of software resource sharing. Internet- based environment for trustworthy software production is recognized as a key topic of software engineering in both academic and software industry. In this paper, the concepts and models of trustworthy software are introduced which dominate the design of Trustie environment. Trustie provides trustworthy software components sharing by an evolving software repository, and provides collaborative software development in a customizable development platform powered by a software production line framework. Finally the layered practices of research and application based on Trustie preliminarily demonstrate the effectiveness as well as the promising future of this environment.
Non-uniform distribution of memory accesses across cache sets has been recognized as one of the sources of inefficiency of cache architecture on single-core platform. Several schemes target the problem for performance...
详细信息
Non-uniform distribution of memory accesses across cache sets has been recognized as one of the sources of inefficiency of cache architecture on single-core platform. Several schemes target the problem for performance boost. As chip multiprocessors (CMPs) pick up steam as the mainstream processor design choice, how non-uniform distribution of memory accesses across cache sets affects the cache management of CMPs is becoming an open question. We address the question by presenting several cache management schemes on CMP platforms, aiming at balance the memory access distribution across cache sets on shared caches or private caches. We show that on CMP platforms with multi-programmed workloads: (a) for shared caches, the non-uniform memory access distribution across different cache sets is biased by the fact that multiple applications are running concurrently and sharing the cache capacity. The scheme, which we put forward to make use of the non-uniformity to improve performance on shared caches, is proved to be of little to no benefit or even lead to degradation, (b) for caches that are organized as private caches, direct adaption of a scheme that targets this kind of non-uniformity outperforms the baseline private cache design by 2% on average, (c) however, for a private cache based cache management scheme we proposed, further effort to take advantage of this kind of non-uniformity for performance boost (on top of our proposed scheme) is also proved to be of little to no benefit. Therefore, We draw to the conclusion that on CMP platforms with multiprogrammed workloads, the non-uniform distribution of memory accesses across cache sets is partially circumvented by the interactions between multiple applications. Efforts seeking to make use of the non-uniformity to derive more benefit may end up in vain in CMPs.
The Quiet DDoS attack becomes one of the most severely threat to the network safety, because this kind of attack completely adopts legal TCP flow while distributing its destination IP to evade various countermeasu...
详细信息
The Quiet DDoS attack becomes one of the most severely threat to the network safety, because this kind of attack completely adopts legal TCP flow while distributing its destination IP to evade various countermeasures deployed in the network. However, the high distributed degree of the destination IP becomes one characteristics of the attack. However, we think this characteristic make partially of the attack flow not match the behavior habit of network users. Inspired by this viewpoint, we propose a novel method to counter the Quiet DDoS attack based on the NBHU (network behavior habit of users). Furthermore, we carry on simulation of our method using NS2 platform, and the results show that this method can reduce the attack performance.
The reliability issue of Exascale system is extremely serious. Traditional passive fault-tolerant methods, such as rollback-recovery, can not fully guarantee system reliability any more because of their large executin...
详细信息
The reliability issue of Exascale system is extremely serious. Traditional passive fault-tolerant methods, such as rollback-recovery, can not fully guarantee system reliability any more because of their large executing overhead and long recovering duration. Active fault tolerance is expected to become another important fault-tolerant approach for Exascale system. Focusing on system failure prediction, which is one key step of active fault tolerance, we construct online failure prediction model and research on the effective method of system status pretreatment. In order to improve the accuracy and real-time feature of current methods, the proposed Improved Adaptive Semantic Filter (IASF) method processes the latest system logs regularly, filtering useless information out of them according to their semantics. Adopting the main idea of Vector Space Model (VSM), IASF method creates Event Vector corresponding to each log record. By calculating the cosine of vectorial angle, it evaluates the semantics correlation between different log records, and then executes temporal and spatial redundant filter considering the burst feature of log records. IASF method is insensitive to the type of system log and does not introduce any expert system or domain knowledge. The experiment result shows that system can eliminate about 99.6% of useless log records after executing IASF method.
Although general purpose GPUs have relatively high computing capacity, they also introduce high power consumption compared with general purpose CPUs. Therefore low-power techniques targeted for GPUs will be one of the...
详细信息
ISBN:
(纸本)9781612842080
Although general purpose GPUs have relatively high computing capacity, they also introduce high power consumption compared with general purpose CPUs. Therefore low-power techniques targeted for GPUs will be one of the most hot topics in the future. On the other hand, in several application domains, users are unwilling to sacrifice performance to save power. In this paper, we propose an effective kernel fusion method to reduce the power consumption for GPUs without performance loss. Different from executing multiple kernels serially, the proposed method fuses several kernels into one larger kernel. Owing to the fact that most consecutive kernels in an application have data dependency and could not be fused directly, we split large kernel into multiple slices with strip-mining method, then fuse independent sliced kernels into one kernel. Based on the CUDA programming model, we propose three different kernel fusion implementations, with each one targeting for a special case. Based on the different strip-ming methods, we also propose two fusion mechanisms, which are called invariant-slice fusion and variant-slice fusion. The latter one could be better adapted to the requirements of the kernels to be fused. The experimental results validate that the proposed kernel fusion method could effectively reduce the power consumption for GPU.
暂无评论