Applying graph clustering algorithms in real world networks needs to overcome two main challenges: the lack of prior knowledge and the scalability issue. This paper proposes a novel method based on the topological fea...
详细信息
Power management has become one of the first-order considerations in high performance computing field. Many recent studies focus on optimizing the performance of a computer system within a given power budget. However,...
详细信息
Power management has become one of the first-order considerations in high performance computing field. Many recent studies focus on optimizing the performance of a computer system within a given power budget. However, most existing solutions adopt fixed period control mechanism and are transparent to the running applications. Although the application-transparent control mechanism has relatively good portability, it exhibits low efficiency in accelerator-based heterogeneous parallel systems. In typical accelerator-based parallel systems, different processing units have largely different processing speeds and power consumption. Under a given power constraint, how to choose the processor to be slowed down and how to schedule a parallel task onto different processors for the maximum performance are different from those in homogeneous systems and have not been well studied. From the motivating example in this paper, we could find that in order to efficiently harness the heterogeneous parallelprocessing, one should not only perform dynamic voltage/frequency scaling (DVFS) to meet the power budget, but also tune the parallel task scheduling to adapt to the changes. In this paper, we propose a heterogeneity-aware peak power management, which extends existing application-transparent power controller with an application-aware power controller. Firstly, we theoretically analyze the conditions for the maximum performance given a power budget for heterogeneous systems. Based on this result, we provide a power-constrained parallel task partition algorithm, which coordinates parallel task partition and voltage scaling for heterogeneous processing units to achieve the optimal performance given a system power budget. Finally, we evaluate the proposed method on a typical CPU-GPU heterogeneous system, and validate the superiority of application-aware power controller over the existing method.
GPUs render higher computing unit density than contemporary CPUs and thus exhibit much higher power consumption despite its higher power efficiency. The power consumption has become an important issue that impacts CPU...
详细信息
GPUs render higher computing unit density than contemporary CPUs and thus exhibit much higher power consumption despite its higher power efficiency. The power consumption has become an important issue that impacts CPU's applications, thereby necessitating the low power optimization technology for GPUs. Software prefetching is an efficient way to alleviate the memory wall problem which overlaps the computing and memory access latencies. However, software prefetching will cause some power overhead because it increases the number and density of the instructions. Thus, we should consider the balance between the performance income and the power overhead when applying the optimization. To address this problem, in this paper we first analyze the multi-thread execution model of GPU and validate the potential space of software prefetching optimization. Then we give the software prefetching method for GPU programs to improve the performance. Aiming at two different objects: energy optimization under performance constraint and performance optimization under power constraint, we discuss the optimization methods based on software prefetching and dynamic voltage scaling technologies. The experimental results show that our method can efficiently optimize the energy consumption (performance) under the performance (power) constraint.
Graph isomorphism problem has applications in many fields, such as chemistry, computer science, electronics, and network theory. But the exponential complexity of the algorithm makes the testing is time consuming. In ...
详细信息
Environment-Driven adaptation is an important means ensuring software integrity. Confronted with scenarios not anticipated during the developmental stage, the predefined adaptability of the software should be adjusted...
详细信息
The reliability issue of Exascale system is extremely serious. Traditional passive fault-tolerant methods, such as rollback-recovery, can not fully guarantee system reliability any more because of their large executin...
详细信息
The reliability issue of Exascale system is extremely serious. Traditional passive fault-tolerant methods, such as rollback-recovery, can not fully guarantee system reliability any more because of their large executing overhead and long recovering duration. Active fault tolerance is expected to become another important fault-tolerant approach for Exascale system. Focusing on system failure prediction, which is one key step of active fault tolerance, we construct online failure prediction model and research on the effective method of system status pretreatment. In order to improve the accuracy and real-time feature of current methods, the proposed Improved Adaptive Semantic Filter (IASF) method processes the latest system logs regularly, filtering useless information out of them according to their semantics. Adopting the main idea of Vector Space Model (VSM), IASF method creates Event Vector corresponding to each log record. By calculating the cosine of vectorial angle, it evaluates the semantics correlation between different log records, and then executes temporal and spatial redundant filter considering the burst feature of log records. IASF method is insensitive to the type of system log and does not introduce any expert system or domain knowledge. The experiment result shows that system can eliminate about 99.6% of useless log records after executing IASF method.
Message Passing Interface (MPI) is a de facto standard for writing high-performance message-passing applications on distributed memory systems. To design effective applications and predict the performance of future sy...
详细信息
Message Passing Interface (MPI) is a de facto standard for writing high-performance message-passing applications on distributed memory systems. To design effective applications and predict the performance of future systems, an accurate communication model is needed. In this paper, we discuss the characteristics of current systems and MPI implementations, then propose a more complete communication model, named LoGPX, which synthetically captures the influences of MPI communication protocol, the invoking time of communication primitive, hardware resource and network contention. Based on the model, we obtain the condition that the message transmission cost reaches the infimum, and we show that if ignoring some factors of LoGPX model, it can be degenerated to several popular models such as LogP, LogGP, LoGPC and LogGPO.
Conventional replacement policy LRU(Least Recently Used) can significantly degrade the overall performance of shared cache of Chip Multi-Processors(CMPs), when the aggregate working set of multiple co-scheduled applic...
详细信息
Conventional replacement policy LRU(Least Recently Used) can significantly degrade the overall performance of shared cache of Chip Multi-Processors(CMPs), when the aggregate working set of multiple co-scheduled applications can not fit in cache. Different applications have different inherent cache access behavior characteristics. Replacement policy should take into account that fact so as to derive more performance benefit. This paper proposes application cache Behavior Identification based Insertion Policy(BIIP)replacement policies for managing shared cache in CMPs. BIIP seeks to make use of the cache access behavior characteristics of each co-scheduled application to smartly choose replacement policy. Our evaluation using a full system CMP simulator shows that BIIP improves the overall throughput by 14.8%,11.2%, 5.6% and 7.2% on average over baseline LRU policy,the prevailing cache partitioning scheme UCP and two other shared cache replacement policies PIPP, TADIP, respectively on a 4-core CMP with 16 SPEC CPU2006 workloads. Moreover,BIIP requires a total storage overhead of no more than several counters per core, and does not require changes to the current cache structure.
暂无评论