Simulation is an important method to evaluate future computersystems. Currently microprocessor architecture has switched to parallel, but almost all simulators remained at sequential stage, and the advantages brought...
详细信息
Simulation is an important method to evaluate future computersystems. Currently microprocessor architecture has switched to parallel, but almost all simulators remained at sequential stage, and the advantages brought by multi-core or many-core processors cannot be utilized. This paper presents a parallel simulator engine (SimK) towards the prevalent SMP/CMP platform, aiming at large-scale fine-grained computersystem simulation. In this paper, highly efficient synchronization, communication and buffer management policies used in SimK are introduced, and a novel lock-free scheduling mechanism that avoids using any atomic instructions is presented. To deal with the load fluctuation at light load case, a cooperated dynamic task migration scheme is proposed. Based on SimK, we have developed large-scale parallel simulators HppSim and HppNetSim, which simulate a full supercomputersystem and its interconnection network respectively. Results show that HppSim and HppNetSim both gain sound speedup with multiple processors, and the best normalized speedup reaches 14.95X on a two-way quad-core server.
Due to the huge size of patterns to be searched,multiple pattern searching remains a challenge to several newly-arising applications like network intrusion *** this paper,we present an attempt to design efficient mult...
详细信息
Due to the huge size of patterns to be searched,multiple pattern searching remains a challenge to several newly-arising applications like network intrusion *** this paper,we present an attempt to design efficient multiple pattern searching algorithms on multi-core *** observe an important feature which indicates that the multiple pattern matching time mainly depends on the number and minimal length of *** multi-core algorithm proposed in this paper leverages this feature to decompose pattern set so that the parallel execution time is *** formulate the problem as an optimal decomposition and scheduling of a pattern set,then propose a heuristic algorithm,which takes advantage of dynamic programming and greedy algorithmic techniques,to solve the optimization *** results suggest that our decomposition approach can increase the searching speed by more than 200% on a 4-core AMD Barcelona system.
Currently, with the evolution of virtualization technology, cloud computing mode has become more and more popular. However, people still concern the issues of the runtime integrity and data security of cloud computing...
详细信息
Currently, with the evolution of virtualization technology, cloud computing mode has become more and more popular. However, people still concern the issues of the runtime integrity and data security of cloud computing platform, as well as the service efficiency on such computing platform. At the same time, according to our knowledge, the design theory of the trusted virtual computing environment and its core system software for such network-based computing platform is at the exploratory stage. In this paper, we believe that efficiency and isolation are the two key proprieties of the trusted virtual computing environment. To guarantee these two proprieties, based on the design principle of splitting, customizing, reconstructing, and isolation-based enhancing to the platform, we introduce TRainbow, a novel trusted virtual computing platform developing by our research *** the two creative mechanisms, that is, capacity flowing amongst VMs and VM-based kernel reconstructing, TRainbow provides great improvements (up to 42%) in service performance and isolated reliable computing environment for Internet-oriented, large-scale, concurrent services.
Moore's law continues to grant computer architects ever more transistors in the foreseeable future, and parallelism is the key to continued performance scaling in modern microprocessors. In this paper, the achievemen...
详细信息
Moore's law continues to grant computer architects ever more transistors in the foreseeable future, and parallelism is the key to continued performance scaling in modern microprocessors. In this paper, the achievements in our research project, which is supported by the National Basic Research 973 Program of China, on parallel architecture, are systematically presented. The innovative approaches and techniques to solve the significant problems in parallel architecture design are smnmarized, including architecture level optimization, compiler and language-supported technologies, reliability, power-performance efficient design, test and verification challenges, and platform building. Two prototype chips, a multi-heavy-core Godson-3 and a many-light-core Godson-T, are described to demonstrate the highly scalable and reconfigurable parallel architecture designs. We also present some of our achievements appearing in ISCA, MICRO, ISSCC, HPCA, PLDI, PACT, IJCAI, Hot Chips, DATE, IEEE Trans. VLSI, IEEE Micro, IEEE Trans. computers, etc.
Circular self test path (CSTP) is an attractive technique for testing digital integrated circuits(IC) in the nanometer era, because it can easily provide at-speed test with small test data volume and short test applic...
详细信息
Circular self test path (CSTP) is an attractive technique for testing digital integrated circuits(IC) in the nanometer era, because it can easily provide at-speed test with small test data volume and short test application time. However, CSTP cannot reliably attain high fault coverage because of difficulty of testing random-pattern-resistant faults. This paper presents a deterministic CSTP (DCSTP) structure that consists of a DCSTP chain and jumping logic, to attain high fault coverage with low area overhead. Experimental re- sults on ISCAS’89 benchmarks show that 100% fault coverage can be obtained with low area overhead and CPU time, especially for large circuits.
With the increasing demand and the wide application of high performance commodity multi-core processors,both the quantity and scale of data centers grow dramatically and they bring heavy energy *** and engineers have ...
详细信息
With the increasing demand and the wide application of high performance commodity multi-core processors,both the quantity and scale of data centers grow dramatically and they bring heavy energy *** and engineers have applied much effort to reducing hardware energy consumption,but software is the true consumer of power and another key in making better use of *** software is critical to better energy utilization,because it is not only the manager of hardware but also the bridge and platform between applications and *** this paper,we summarize some trends that can affect the efficiency of data ***,we investigate the causes of software *** on these studies,major technical challenges and corresponding possible solutions to attain green system software in programmability,scalability,efficiency and software architecture are ***,some of our research progress on trusted energy efficient system software is briefly introduced.
Due to complex abstractions implemented over shared data structures protected by locks, conventional symmetric multithreaded operating system kernel such as Linux is hard to achieve high scalability on the emerging mu...
详细信息
This paper describes the design-for-testability (DFT) features and low-cost testing solutions of a general purpose microprocessor. The optimized DFT features are presented in detail. A hybrid scan compression struct...
详细信息
This paper describes the design-for-testability (DFT) features and low-cost testing solutions of a general purpose microprocessor. The optimized DFT features are presented in detail. A hybrid scan compression structure was executed and achieved compression ratio more than ten times. Memory built-in self-test (BIST) circuitries were designed with scan collars instead of bitmaps to reduce area overheads and to improve test and debug efficiency. The implemented DFT framework also utilized internal phase-locked loops (PLL) to provide complex at-speed test clock sequences. Since there are still limitations in this DFT design, the test strategies for this case are quite complex, with complicated automatic test pattern generation (ATPG) and debugging flow. The sample testing results are given in the paper. All the DFT methods discussed in the paper are prototypes for a high-volume manufacturing (HVM) DFT plan to meet high quality test goals as well as slow test power consumption and cost.
With the shrink of the technology into nanometer scale, network-on-chip (NOC) has become a reasonable solution for connecting plenty of IP blocks on a single chip. But it suffers from both crosstalk effects and sing...
详细信息
With the shrink of the technology into nanometer scale, network-on-chip (NOC) has become a reasonable solution for connecting plenty of IP blocks on a single chip. But it suffers from both crosstalk effects and single event upset (SEU), especially crosstalk-induced delay, which may constrain the overall performance of NOC. In this paper, we introduce a reliable NOC design using a code with the capability of both crosstalk avoidance and single error correction. Such a code, named selected crosstalk avoidance code (SCAC) in our previous work, joins crosstalk avoidance code (CAC) and error correction code (ECC) together through codeword selection from an original CAC codeword set. It can handle possible error caused by either crosstalk effects or SEU. When designing a reliable NOC, data are encoded to SCAC codewords and can be transmitted rapidly and reliably across NOC. Experimental results show that the NOC design with SCAC achieves higher performance and is reliable to tolerate single errors. Compared with previous crosstalk avoidance methods, SCAC reduces wire overhead, power dissipation and the total delay. When SCAC is used in NOC, it can save 20% area overhead and reduce 49% power dissipation.
The advent of multi-core/many-core chip technology offers both an extraordinary opportunity and a profound challenge. In particular, computer architects and system software designers are faced with a unique opportunit...
详细信息
The advent of multi-core/many-core chip technology offers both an extraordinary opportunity and a profound challenge. In particular, computer architects and system software designers are faced with a unique opportunity to introducing new architecture features as well as adequate compiler technology -- together they may have profound impact. This paper presents a case study (using the 1-D Jacobi computation) of compiler-amendable performance optimization techniques on a many-core architecture Godson-T. Godson-T architecture has several unique features that are chosen for this study: 1) chip-level global addressable memory in particular the scratchpad memories (SPM) local to the processing cores; 2) fine-grain memory based synchronization (e.g., full-empty bit for fine-grain synchronization). Leveraging state-of-the-art performance optimization methods for 1-D stencil parallelization (e.g., timed tiling and variants), we developed and implement a number of many-core-based optimization for Godson-T. Our experimental study shows good performance in both execution time speedup and scalability, validate the value of globally accessed SPM and fine-grain synchronization mechanism (full-empty bits) under the Godson-T, and provides some useful guidelines for future compiler technology of many-core chip architectures.
暂无评论