In this paper, we present the Tianhe-2 interconnect network and message passing services. We describe the architecture of the router and network interface chips, and highlight a set of hardware and software features e...
详细信息
In this paper, we present the Tianhe-2 interconnect network and message passing services. We describe the architecture of the router and network interface chips, and highlight a set of hardware and software features effectively supporting high performance communications, ranging over remote direct memory access, collective optimization, hardwareenable reliable end-to-end communication, user-level message passing services, etc. Measured hardware performance results are also presented.
As the electronic technology develops, the integration levels of CPUs and memories keep growing, and the speeds of communication devices are improved. The high-performance computing (HPC) systems consist of processing...
详细信息
Predicting network latencies between Internet hosts can efficiently support large-scale Internet applications, e.g., file sharing service and the overlay construction. Several study use the Hyperbolic space to model t...
详细信息
Program mode is a regular trajectory of the execution of a program that is determined by the values of its input variables. By exploiting program modes we may make Worst Case Execution Time (WCET) analysis more precis...
详细信息
ISBN:
(纸本)1595934081
Program mode is a regular trajectory of the execution of a program that is determined by the values of its input variables. By exploiting program modes we may make Worst Case Execution Time (WCET) analysis more precise. This paper presents a novel method to automatically find program modes and calculate the WCET of programs. It consists of two phases. In phase one, we firstly automatically find the modes of a program by mode-relevant program slicing;then we compute the precondition for each mode using a path-wise test data generation method;after that, we can either conclude that it is an infeasible path, or get its precondition. In phase two, we calculate the WCET estimate of each given mode for modern RISC processors with caches and pipelines. The experiments are demonstrated to show the effectiveness of the method. Copyright 2006 ACM.
The paper presents the ongoing work of studying FMEA method for embedded safety critical software via formal analysis of various dependence relations among software elements, which can fairly improve the automation an...
详细信息
ISBN:
(纸本)9780769532622
The paper presents the ongoing work of studying FMEA method for embedded safety critical software via formal analysis of various dependence relations among software elements, which can fairly improve the automation and precision of both system level and detailed level FMEA. These dependence relations are depicted by the formal models abstracted from software design and implementation, and the FMEA processes for both structural and object-oriented software are proposed respectively. The initial result of case study shows the effectiveness of the approach. I 2008 IEEE.
In recent years, many C code static analyzers, with different abilities of bug detection, have appeared and been applied in various domains. There are so many choices that it becomes hard for programmers to know in de...
详细信息
Recently, GPGPU has been adopted well in the High Performance Computing (HPC) field. The limited global memory bandwidth poses a great challenge to many GPGPU programmers trying to exploit parallelism within the CPUGP...
详细信息
The application of memristor in building hardware neural network has accepted widespread interests, and may bring novel opportunities to neural computing. However, due to the limitation of programming precision, the c...
详细信息
Proximity ranking according to end-to-end network distances (e.g., Round-Trip Time, RTT) can reveal detailed proximity information, which is important in network management and performance diagnosis in distributed sys...
详细信息
OpenCL is an open heterogeneous programming framework. Although OpenCL programs are func- tionally portable, they do not provide performance portability, so code transformation often plays an irreplaceable role. When ...
详细信息
OpenCL is an open heterogeneous programming framework. Although OpenCL programs are func- tionally portable, they do not provide performance portability, so code transformation often plays an irreplaceable role. When adapting GPU-specific OpenCL kernels to run on multi-core/many-core CPUs, coarsening the thread granularity is necessary and thus has been extensively used. However, locality concerns exposed in GPU-specific OpenCL code are usually inherited without analysis, which may give side-effects on the CPU performance. Typi- cally, the use of OpenCL's local memory on multi-core/many-core CPUs may lead to an opposite performance effect, because local-memory arrays no longer match well with the hardware and the associated synchronizations are costly. To solve this dilemma, we actively analyze the memory access patterns using array-access descriptors derived from GPU-specific kernels, which can thus be adapted for CPUs by (1) removing all the unwanted local-memory arrays together with the obsolete barrier statements and (2) optimizing the coalesced kernel code with vectorization and locality re-exploitation. Moreover, we have developed an automated tool chain that makes this transformation of GPU-specific OpenCL kernels into a CPU-friendly form, which is accompanied with a scheduler that forms a new OpenCL runtime. Experiments show that the automated transformation can improve OpenCL kernel performance on a multi-core CPU by an average factor of 3.24. Satisfactory performance improvements axe also achieved on Intel's many-integrated-core coprocessor. The resultant performance on both architectures is better than or comparable with the corresponding OpenMP performance.
暂无评论