With the rapid development of open source software, various elements such as OSS, developers, users and online posts, across different communities and their interactions constitute a novel software ecosystem. Most of ...
详细信息
Recently, GPGPU has been adopted well in the High Performance Computing (HPC) field. The limited global memory bandwidth poses a great challenge to many GPGPU programmers trying to exploit parallelism within the CPUGP...
详细信息
Although Statecharts has gained widespread use as a formalism for modeling reactive real-time systems, testing these systems still confronts some difficulties, of which a major one is the existence of numerous and com...
详细信息
OpenCL is an open heterogeneous programming framework. Although OpenCL programs are func- tionally portable, they do not provide performance portability, so code transformation often plays an irreplaceable role. When ...
详细信息
OpenCL is an open heterogeneous programming framework. Although OpenCL programs are func- tionally portable, they do not provide performance portability, so code transformation often plays an irreplaceable role. When adapting GPU-specific OpenCL kernels to run on multi-core/many-core CPUs, coarsening the thread granularity is necessary and thus has been extensively used. However, locality concerns exposed in GPU-specific OpenCL code are usually inherited without analysis, which may give side-effects on the CPU performance. Typi- cally, the use of OpenCL's local memory on multi-core/many-core CPUs may lead to an opposite performance effect, because local-memory arrays no longer match well with the hardware and the associated synchronizations are costly. To solve this dilemma, we actively analyze the memory access patterns using array-access descriptors derived from GPU-specific kernels, which can thus be adapted for CPUs by (1) removing all the unwanted local-memory arrays together with the obsolete barrier statements and (2) optimizing the coalesced kernel code with vectorization and locality re-exploitation. Moreover, we have developed an automated tool chain that makes this transformation of GPU-specific OpenCL kernels into a CPU-friendly form, which is accompanied with a scheduler that forms a new OpenCL runtime. Experiments show that the automated transformation can improve OpenCL kernel performance on a multi-core CPU by an average factor of 3.24. Satisfactory performance improvements axe also achieved on Intel's many-integrated-core coprocessor. The resultant performance on both architectures is better than or comparable with the corresponding OpenMP performance.
The fully coupled pressure-based algorithm is widely recognised for its superior convergence and robustness in solving incompressible flow problems. However, the increased scale of equations and the difficulty in solv...
详细信息
We propose a parallel exact diagonalization method for solving the large-scale Hubbard model. The core of this algorithm is the parallelization of the Lanczos algorithm, for which we propose a hierarchical communicati...
详细信息
The Internet-based virtual computing environment (iVCE) is a novel network computing *** characteristics of growth,autonomy,and diversity of Internet resources present great challenges to resource sharing in *** DHT o...
详细信息
The Internet-based virtual computing environment (iVCE) is a novel network computing *** characteristics of growth,autonomy,and diversity of Internet resources present great challenges to resource sharing in *** DHT overlay (DHT for short) technique has various advantages such as high scalability,low latency,and desirable availability,and is thus an important approach to realizing efficient resource *** construction is a key technique for structured overlays that realizes basic overlay functions including dynamic maintenance and message *** this paper,we first introduce the traditional techniques of DHT topology construction,focusing mainly on dynamic maintenance and message routing of typical DHTs,DHT indexing techniques for complex queries,and DHT grouping techniques for matching domain *** then present recent advances in DHT topology construction techniques in iVCE taking advantage of the characteristics of Internet ***,we discuss the future of DHT topology construction techniques.
This paper addresses the optimization of parallel simulators for large-scale parallel systems and applications. Such simulators are often based on parallel discrete event simulation with conservative or optimistic pro...
详细信息
distributed real-time systems are of one important type of real-time systems. They are usually characterized by both reactive and real-time factors and it has long been recognized that how to automatically check such ...
详细信息
Data replication can be used to reduce bandwidth consumption and access latency in the distributed system where users require remote access to large data objects. In this paper, according to the intrinsic characterist...
详细信息
暂无评论