During the improved process of hardware technology and program optimization in the high-performance computer (HPC), the overall computing ability of supercomputer increases correspondingly. As a large-scale system and...
详细信息
Job scheduling is crucial in high-performance computing (HPC), which is dedicated to deciding when and which jobs are allocated to the system and placing the jobs on which resources, by considering multiple scheduling...
详细信息
Godson2H is a complex SoC (system-on-Chip) of Godson series, which is a 117mm2, 152 million transistors chip fabricated in 65 nm CMOS LP/GP process technology. It integrates a 1 GHz processor core and abundant high ...
详细信息
Godson2H is a complex SoC (system-on-Chip) of Godson series, which is a 117mm2, 152 million transistors chip fabricated in 65 nm CMOS LP/GP process technology. It integrates a 1 GHz processor core and abundant high or low speed peripheral IO interfaces. To overcome on-chip-variation problems in deep submicron designs, many methods are adopted in clock tree, and PVT detectors are integrated for debug. To meet the low power constraints in different applications, most of state-of-the-art low power methods are used carefully, such as dynamic voltage and frequency scaling, power gating and aggressive multi-voltage design.
Modern compilers use machine learning to find from their prior experience useful heuristics for new programs encountered in order to accelerate the optimization process. However, prior experience might not be applicab...
详细信息
In this paper, we present a hybrid circular queue method that can significantly boost the performance of stencil computations on GPU by carefully balancing usage of registers and shared-memory. Unlike earlier methods ...
详细信息
In this paper, we present a hybrid circular queue method that can significantly boost the performance of stencil computations on GPU by carefully balancing usage of registers and shared-memory. Unlike earlier methods that rely on circular queues predominantly implemented using indirectly addressable shared memory, our hybrid method exploits a new reuse pattern spanning across the multiple time steps in stencil computations so that circular queues can be implemented by both shared memory and registers effectively in a balanced manner. We describe a framework that automatically finds the best placement of data in registers and shared memory in order to maximize the performance of stencil computations. Validation using four different types of stencils on three different GPU platforms shows that our hybrid method achieves speedups up to 2.93X over methods that use circular queues implemented with shared-memory only.
Fault injection plays a critical role in the verification of software’s reliability. This paper presents a fault injection method by mutation. Moreover, a strategy of using semantic-based mutators is proposed to impr...
详细信息
Library functions and system calls have been major difficulties faced by automatic test. Input/ output (I/O) functions are a set of common library functions. Testers have to interact with the test procedures if the te...
详细信息
With the explosion of the amount of data, analytics applications require much higher performance and scalability. However, traditional DBMS encounters the tough obstacle of scalability, and could not handle big data e...
详细信息
This paper presents a methodology for high-level power modeling of cell-based processors. A flexible power model library, which can automatically generate detailed power data for actual circuits of each part of given ...
详细信息
This paper presents a methodology for high-level power modeling of cell-based processors. A flexible power model library, which can automatically generate detailed power data for actual circuits of each part of given processor, is developed and annotated dynamically for architecture-level power simulator. According to this method, the dynamic power, leakage power and even area and cell counts can be accurately estimated, and the preliminary power validation for a MIPS microprocessor proves our methodology to be effective and highly correlated, with only small errors comparing with the gate-level power analysis.
The superconducting rapid single flux quantum (RSFQ) logic circuit has the characteristics of high speed and low power consumption, making it an attractive candidate for future supercomputers. However, computer-aided ...
详细信息
暂无评论