Applying graph clustering algorithms in real world networks needs to overcome two main challenges: the lack of prior knowledge and the scalability issue. This paper proposes a novel method based on the topological fea...
详细信息
Graph isomorphism problem has applications in many fields, such as chemistry, computer science, electronics, and network theory. But the exponential complexity of the algorithm makes the testing is time consuming. In ...
详细信息
Environment-Driven adaptation is an important means ensuring software integrity. Confronted with scenarios not anticipated during the developmental stage, the predefined adaptability of the software should be adjusted...
详细信息
Heterogeneous parallel systems have become popular in general purpose computing and even high performance computing fields. There are many studies focused on harnessing heterogeneous parallelprocessing for better per...
详细信息
Heterogeneous parallel systems have become popular in general purpose computing and even high performance computing fields. There are many studies focused on harnessing heterogeneous parallelprocessing for better performance. However the energy optimization for heterogeneous system has not been well studied. Owing to the differences in performance and energy consumption, the energy optimization technique for heterogeneous system is different from the existing methods designed for homogeneous system. Besides typical voltage scaling method, reasonable task partitioning is also an essential method for optimizing energy consumption on heterogeneous systems. Through partitioning a data parallel task and mapping sub-tasks onto several processors, one could achieve better performance and reduced energy consumption. As the computation cost reduces with specific accelerators, the communication overhead becomes more prominent. Therefore, the task partition optimization should holistically consider the computation improvement and communication overhead to achieve higher energy efficiency. Typically, task partition and voltage scaling are not orthogonal and influence the effect of each other in the energy optimization problem. In order to harness both two knobs efficiently, this paper proposes an integer linear programming (ILP) based energy-optimal solution designed for heterogeneous system. We present a case study of optimizing MGRID benchmark on a typical CPU-GPU heterogeneous system. The experimental results demonstrate that the proposed method could exploit the heterogeneity in different processors and achieve improved energy efficiency.
Power management has become one of the first-order considerations in high performance computing field. Many recent studies focus on optimizing the performance of a computer system within a given power budget. However,...
详细信息
Power management has become one of the first-order considerations in high performance computing field. Many recent studies focus on optimizing the performance of a computer system within a given power budget. However, most existing solutions adopt fixed period control mechanism and are transparent to the running applications. Although the application-transparent control mechanism has relatively good portability, it exhibits low efficiency in accelerator-based heterogeneous parallel systems. In typical accelerator-based parallel systems, different processing units have largely different processing speeds and power consumption. Under a given power constraint, how to choose the processor to be slowed down and how to schedule a parallel task onto different processors for the maximum performance are different from those in homogeneous systems and have not been well studied. From the motivating example in this paper, we could find that in order to efficiently harness the heterogeneous parallelprocessing, one should not only perform dynamic voltage/frequency scaling (DVFS) to meet the power budget, but also tune the parallel task scheduling to adapt to the changes. In this paper, we propose a heterogeneity-aware peak power management, which extends existing application-transparent power controller with an application-aware power controller. Firstly, we theoretically analyze the conditions for the maximum performance given a power budget for heterogeneous systems. Based on this result, we provide a power-constrained parallel task partition algorithm, which coordinates parallel task partition and voltage scaling for heterogeneous processing units to achieve the optimal performance given a system power budget. Finally, we evaluate the proposed method on a typical CPU-GPU heterogeneous system, and validate the superiority of application-aware power controller over the existing method.
Message Passing Interface (MPI) is a de facto standard for writing high-performance message-passing applications on distributed memory systems. To design effective applications and predict the performance of future sy...
详细信息
Message Passing Interface (MPI) is a de facto standard for writing high-performance message-passing applications on distributed memory systems. To design effective applications and predict the performance of future systems, an accurate communication model is needed. In this paper, we discuss the characteristics of current systems and MPI implementations, then propose a more complete communication model, named LoGPX, which synthetically captures the influences of MPI communication protocol, the invoking time of communication primitive, hardware resource and network contention. Based on the model, we obtain the condition that the message transmission cost reaches the infimum, and we show that if ignoring some factors of LoGPX model, it can be degenerated to several popular models such as LogP, LogGP, LoGPC and LogGPO.
Conventional replacement policy LRU(Least Recently Used) can significantly degrade the overall performance of shared cache of Chip Multi-Processors(CMPs), when the aggregate working set of multiple co-scheduled applic...
详细信息
Conventional replacement policy LRU(Least Recently Used) can significantly degrade the overall performance of shared cache of Chip Multi-Processors(CMPs), when the aggregate working set of multiple co-scheduled applications can not fit in cache. Different applications have different inherent cache access behavior characteristics. Replacement policy should take into account that fact so as to derive more performance benefit. This paper proposes application cache Behavior Identification based Insertion Policy(BIIP)replacement policies for managing shared cache in CMPs. BIIP seeks to make use of the cache access behavior characteristics of each co-scheduled application to smartly choose replacement policy. Our evaluation using a full system CMP simulator shows that BIIP improves the overall throughput by 14.8%,11.2%, 5.6% and 7.2% on average over baseline LRU policy,the prevailing cache partitioning scheme UCP and two other shared cache replacement policies PIPP, TADIP, respectively on a 4-core CMP with 16 SPEC CPU2006 workloads. Moreover,BIIP requires a total storage overhead of no more than several counters per core, and does not require changes to the current cache structure.
暂无评论