Determinism is very useful to multithreaded programs in debugging, testing, etc. Many deterministic ap- proaches have been proposed, such as deterministic multithreading (DMT) and deterministic replay. However, thes...
详细信息
Determinism is very useful to multithreaded programs in debugging, testing, etc. Many deterministic ap- proaches have been proposed, such as deterministic multithreading (DMT) and deterministic replay. However, these sys- tems either are inefficient or target a single purpose, which is not flexible. In this paper, we propose an efficient and flexible deterministic framework for multithreaded programs. Our framework implements determinism in two steps: relaxed determinism and strong determinism. Relaxed determinism solves data races eificiently by using a proper weak memory consistency model. After that, we implement strong determinism by solving lock contentions deterministically. Since we can apply different approaches for these two steps independently, our framework provides a spectrum of deterministic choices, including nondeterministic system (fast), weak deterministic system (fast and conditionally deterministic), DMT system, and deternfinistic replay system. Our evaluation shows that the DMT configuration of this framework could even outperform a state-of-the-art DMT system.
Semi-supervised learning (SSL) utilizes plenty of unlabeled examples to boost the performance of learning from limited labeled examples. Due to its great discriminant power, SSL has been widely applied to various real...
详细信息
Detecting concurrency bugs is becoming increasingly important. Many pattern-based concurrency bug detectors focus on the specific types of interleavings that are correlated to concurrency bugs. To detect multiple type...
详细信息
One of the most significant challenges introduced by routing protocol in mobile networks is coping with the unpredictable motion and the unreliable behaviour of mobile nodes. In this paper, we present a hierarchical r...
详细信息
Regenerating codes have been proposed to achieve an optimal trade-off curve between the amount of storage space and the network traffic for repair. However, existing repair schemes based on regenerating codes are inad...
详细信息
To provide timely results for ‘Big Data Analytics’, it is crucial to satisfy deadline requirements for MapReduce jobs in production environments. In this paper, we propose a deadline-oriented task scheduling approac...
详细信息
Moving objects detection is important in traffic video analysis, and many algorithms are being increasingly applied to moving objects detection. Most of these algorithms are time-consuming and cannot satisfy real-time...
详细信息
Stragglers can temporize jobs and reduce cluster efficiency seriously. Many researches have been contributed to the solution, such as Blacklist[8], speculative execution[1, 6], Dolly[8]. In this paper, we put forward ...
详细信息
As the rapid growth of open source software, how to choose software from many alternatives becomes a great challenge. Traditional ranking approaches mainly focus on the characteristics of the software themselves, such...
详细信息
As we are approaching the exascale era in supercomputing, designing a balanced computer system with powerful computing ability and low energy consumption becomes increasingly important. GPU is a widely used accelerato...
详细信息
ISBN:
(纸本)9781509032068
As we are approaching the exascale era in supercomputing, designing a balanced computer system with powerful computing ability and low energy consumption becomes increasingly important. GPU is a widely used accelerator in most recently applied supercomputers. It adopts massive multithreads to hide long latency and has high energy efficiency. In contrast to its strong computing power, GPUs have few on-chip resources with several MB of fast on-chip memory storage per SM (Streaming Multiprocessors). GPU caches exhibit poor efficiency due to the mismatch of the throughput-oriented execution model and its cache hierarchy design. Since the severe deficiency in on-chip memory, the benefit of high computing capacity of GPUs is pulled down by the poor cache performance dramatically, which limits system performance and energy-efficiency. In this paper, we put forward a locality protected scheme to make full use of the data locality based on the fixed capacity. We present a Locality Protected method based on instruction PC (LPP) to promote GPU performance. Firstly, we use a PC-based collector to collect the reuse information of each cache line. After getting the dynamic reuse information of the cache line, we take an intelligent cache allocation unit (ICAU) which coordinates the reuse information with LRU (Least Recently Used) replacement policy to find out the cache line with the least locality for eviction. The results show that LPP provides an up to 17.8% speedup and an average of 5.5% improvement over the baseline method.
暂无评论