Defect distribution prediction is a meaningful topic because software defects are the fundamental cause of many attacks and data loss. Building accurate prediction models can help developers find bugs and prioritize t...
详细信息
Defect distribution prediction is a meaningful topic because software defects are the fundamental cause of many attacks and data loss. Building accurate prediction models can help developers find bugs and prioritize their testing efforts. Previous researches focus on exploring different machine learning algorithms based on the features that encode the characteristics of *** problem of data redundancy exists in software defect data set, which has great influence on prediction *** propose a defect distribution prediction model(Deep belief network prediction model, DBNPM), a system for detecting whether a program module contains defects. The key insight of DBNPM is Deep belief network(DBN)technology, which is an effective deep learning technique in image processing and natural language processing,whose features are similar to defects in source *** results show that DBNPM can efficiently extract and process the data characteristics of source program and the performance is better than Support vector machine(SVM), Locally linear embedding SVM(LLE-SVM), and Neighborhood preserving embedding SVM(NPE-SVM).
Godson2H is a complex SoC (System-on-Chip) of Godson series, which is a 117mm2, 152 million transistors chip fabricated in 65 nm CMOS LP/GP process technology. It integrates a 1 GHz processor core and abundant high ...
详细信息
Godson2H is a complex SoC (System-on-Chip) of Godson series, which is a 117mm2, 152 million transistors chip fabricated in 65 nm CMOS LP/GP process technology. It integrates a 1 GHz processor core and abundant high or low speed peripheral IO interfaces. To overcome on-chip-variation problems in deep submicron designs, many methods are adopted in clock tree, and PVT detectors are integrated for debug. To meet the low power constraints in different applications, most of state-of-the-art low power methods are used carefully, such as dynamic voltage and frequency scaling, power gating and aggressive multi-voltage design.
Moore's law continues to grant computer architects ever more transistors in the foreseeable future, and para-llelism is the key to continued performance scaling in modern microprocessors. In this paper, the achiev...
详细信息
Currently, large-scale vision and language models has significantly improved the performances of cross-modal retrieval tasks. However, large-scale models require a substantial amount of computing resources, so the exe...
详细信息
In this paper we present a thorough experience on tuning double-precision matrix-matrix multiplication (DGEMM) on the Fermi GPU architecture. We choose an optimal algorithm with blocking in both shared memory and regi...
详细信息
ISBN:
(纸本)9781450307710
In this paper we present a thorough experience on tuning double-precision matrix-matrix multiplication (DGEMM) on the Fermi GPU architecture. We choose an optimal algorithm with blocking in both shared memory and registers to satisfy the constraints of the Fermi memory hierarchy. Our optimization strategy is further guided by a performance modeling based on micro-architecture benchmarks. Our optimizations include software pipelining, use of vector memory operations, and instruction scheduling. Our best CUDA algorithm achieves comparable performance with the latest CUBLAS library1. We further improve upon this with an implementation in the native machine language, leading to 20% increase in performance. That is, the achieved peak performance (efficiency) is improved from 302Gflop/s (58%) to 362Gflop/s (70%). Copyright 2011 ACM.
Though many research groups have explored the design methodology of cluster system software stack, few works discuss what constitutes a good one. In this paper, we choose four criteria throughout the lifecycle of clus...
详细信息
ISBN:
(纸本)9781595939036
Though many research groups have explored the design methodology of cluster system software stack, few works discuss what constitutes a good one. In this paper, we choose four criteria throughout the lifecycle of cluster system software stack to evaluate its design methodology, including code reusability, evolveability, adaptability and manageability. According to the four criteria, we have proposed a management service-based layered design methodology and built a complete cluster system software stack for both scientific and business computing. Our practices and evaluations show our design methodology has advantages over others in terms of the proposed criteria. Copyright 2007 ACM.
In this paper, we present a hybrid circular queue method that can significantly boost the performance of stencil computations on GPU by carefully balancing usage of registers and shared-memory. Unlike earlier methods ...
详细信息
In this paper, we present a hybrid circular queue method that can significantly boost the performance of stencil computations on GPU by carefully balancing usage of registers and shared-memory. Unlike earlier methods that rely on circular queues predominantly implemented using indirectly addressable shared memory, our hybrid method exploits a new reuse pattern spanning across the multiple time steps in stencil computations so that circular queues can be implemented by both shared memory and registers effectively in a balanced manner. We describe a framework that automatically finds the best placement of data in registers and shared memory in order to maximize the performance of stencil computations. Validation using four different types of stencils on three different GPU platforms shows that our hybrid method achieves speedups up to 2.93X over methods that use circular queues implemented with shared-memory only.
Although the genetic algorithm has been widely used in the polarity optimization of mixed polarity Reed- Muller (MPRM) logic circuits, few studies have taken into account the polarity conversion sequence. In order t...
详细信息
Although the genetic algorithm has been widely used in the polarity optimization of mixed polarity Reed- Muller (MPRM) logic circuits, few studies have taken into account the polarity conversion sequence. In order to im- prove the efficiency of polarity optimization of MPRM logic circuits, we propose an efficient and fast polarity optimiza- tion approach (FPOA) considering the polarity conversion se- quence. The main idea behind the FPOA is that, firstly, the best polarity conversion sequence of the polarity set wait- ing for evaluation is obtained by using the proposed hybrid genetic algorithm (HGA); secondly, each of polarity in the polarity set is converted according to the best polarity con- version sequence obtained by HGA. Our proposed FPOA is implemented in C and a comparative analysis has been pre- sented for MCNC benchmark circuits. The experimental re- suits show that for the circuits with more variables, the FPOA is highly effective in improving the efficiency of polarity op- timization of MPRM logic circuits compared with the tradi- tional polarity optimization approach which neglects the po- larity conversion sequence and the improved polarity opti- mization approach with heuristic technique.
The superconducting rapid single flux quantum (RSFQ) logic circuit has the characteristics of high speed and low power consumption, making it an attractive candidate for future supercomputers. However, computer-aided ...
详细信息
With the explosion of the amount of data, analytics applications require much higher performance and scalability. However, traditional DBMS encounters the tough obstacle of scalability, and could not handle big data e...
详细信息
暂无评论