Virtual machine technology can provide high server utilization and service consolidation on an individual physical machine, and gains acceptance in diverse fields. In a growing number of contexts, many situations requ...
详细信息
We propose an efficient algorithm on generation of a universal path candidate set U that contains testable long paths for delay testing. Some strategies are presented to speed up the depth-first search procedure of U ...
详细信息
We propose an efficient algorithm on generation of a universal path candidate set U that contains testable long paths for delay testing. Some strategies are presented to speed up the depth-first search procedure of U generation, targeting the reduction of checking times of sensitization criteria. Experimental results illustrate that our approach achieves an 8X speedup on average in comparison with the traditional depth-first search approach.
This paper describes a multi-FPGA based platform for emulating the Loongson-2G micro-processor on different mother boards. This platform is developed targeting at verification and evaluation of the Loongson-2G micro-p...
详细信息
ISBN:
(纸本)9781605589114
This paper describes a multi-FPGA based platform for emulating the Loongson-2G micro-processor on different mother boards. This platform is developed targeting at verification and evaluation of the Loongson-2G micro-processor, which is the next generation of Loongson-2 family, composed by one four-issue, out-of-order execution way 64-bit MIPS-compatible processor core named GS464, one 1M byte secondary Cache, one HyperTransport IO interface, one DDR2/3 memory interface and some other low speed IO interfaces. Most parts of this micro-process are mapped into the multi-FPGA based platform which consists two Vertex-5 330 FPGA chips. Semi-custom partitioning tactics within the entire design flow are developed to synthesize the whole designed into the multi-FPGA based platform. Modifications in architectural level are applied to the original architecture of the chip, in order to make it easy to be partitioned into two parts. High speed SEDES of HyperTransport IO link and DDR2/3 memory interface are emulated by using several clocks with different clock phases. To resolve the problem that hard to debug in FPGA system, a method by software probe with help of injected hardware modules in FPGA is developed and used to debug the problem causing by behavior mismatching between the ASIC ram block and the FPGA ram block. Some evaluation work on performance of Loongson-2G is done on this multi-FPGA based platform as pre-silicon test. To the authors' knowledge, there has been no previous work on such a big design used for verification and evaluation.
MapReduce is a programming framework introduced by Google for large-scale data processing. It is usually used in a scan-centric fashion where all the data are split into blocks and Maps are generated for each block to...
详细信息
MapReduce is a programming framework introduced by Google for large-scale data processing. It is usually used in a scan-centric fashion where all the data are split into blocks and Maps are generated for each block to scan and process the data in the block, then Reduces merge outputs from all the Maps. When a query intends to process only a subset of the data selected by a predicate, this brute-force method may cause extra I/O overhead spent on irrelevant data, and the overhead for initiating so many Maps may be non-trivial given that the actually interesting data for the query is comparatively small in volume. We propose an approach to integrate the index into the MapReduce execution in which only an appropriate number of Maps are generated, each of which accesses the data using an index. This approach incurs random I/O and remote access to data, so the overall performance depends on both system parameters and the query characteristics. We build a cost model for both this index access execution and the traditional full scan execution. This cost model can be used to choose between the two execution modes before executing a query. Experiments show that the index access execution can greatly outperform full scan execution when the selectivity of the predicate is low, and the cost model predicts the actual execution cost very well so can be used to determine the execution plan for a query.
Page switching is a technique that increases the memory in microcontrollers without extending the address buses. This technique is widely used in the design of 8-bit MCUs. In this paper, we present an algorithm to red...
详细信息
Topology virtualization techniques are proposed for NoC-based many-core processors with core-level redundancy to isolate hardware changes caused by on-chip defective cores. Prior work focuses on homogeneous cores with...
详细信息
ISBN:
(纸本)9783981080162
Topology virtualization techniques are proposed for NoC-based many-core processors with core-level redundancy to isolate hardware changes caused by on-chip defective cores. Prior work focuses on homogeneous cores with symmetric performance and optimizes on-chip communication only. However, core-to-core performance asymmetry due to manufacturing process variations poses new challenges for constructing virtual topologies. Lower performance cores may scatter over a virtual topology, while operating systems typically allocate tasks to continuous cores. As a result, parallel applications are probably assigned to a region containing many slower cores that become bottlenecks. To tackle the above problem, in this paper we present a novel performance-asymmetry-aware reconfiguration algorithm Bubble-Up based on a new metric called core fragmentation factor (CFF). Bubble-Up can arrange cores with similar performance closer, yet maintaining reasonable hop distances between virtual neighbors, thus accelerating applications with higher degree of parallelism, without changing existing allocation strategies for OS. Experimental results show its effectiveness.
Modern compilers use machine learning to find from their prior experience useful heuristics for new programs encountered in order to accelerate the optimization process. However, prior experience might not be applicab...
详细信息
Modern compilers use machine learning to find from their prior experience useful heuristics for new programs encountered in order to accelerate the optimization process. However, prior experience might not be applicable for outlier programs with unfamiliar code features. This paper presents a Reverse K-nearest neighbor (RKNN) algorithm based approach for outlier detection. The compiler can therefore launch a search within an optimization space when outlier programs are encountered, or directly apply its experience to non-outliers. Preliminary experimental results demonstrate the effectiveness of the approach.
Synchronization schemes are critical for on-chip multi-core and many-core processor to execute correctly and communicate cooperatively. The efficiency of the synchronization is very important for the processor. In thi...
详细信息
Synchronization schemes are critical for on-chip multi-core and many-core processor to execute correctly and communicate cooperatively. The efficiency of the synchronization is very important for the processor. In this paper, for on-chip many-core architecture, three types of synchronization schemes are proposed. That is, two types of coarse-grain synchronization schemes based on dedicated hardware support and atomic operation, and a fine-grain synchronization scheme based on Full/Empty bit. Then, the evaluation criterions and methods are proposed, in which quantitative evaluation micro-benchmarks are designed for coarse-grain synchronization schemes. Finally, the coarse-grain synchronization schemes are evaluated via a many-core architecture simulator, i.e., Godson-T, and AMD Opteron commercial on-chip multi-processor using pThread multi-thread program model. The results show that hardware support improves the performance of the synchronization obviously for on-chip many-core processor, and the performance loss of the traditional synchronization scheme based on atomic instructions is caused by the waiting cost of load imbalance and serialization on synchronization point mostly.
In this paper, we propose a binary-tree waveguide connected Optical-Network-on-Chip (ONoC) to accelerate the establishment of the lightpath. By broadcasting the control data in the proposed power-efficient binary-tree...
详细信息
ISBN:
(纸本)9783981080162
In this paper, we propose a binary-tree waveguide connected Optical-Network-on-Chip (ONoC) to accelerate the establishment of the lightpath. By broadcasting the control data in the proposed power-efficient binary-tree waveguide, the maximal hops for establishing lightpath is reduced to two. With extensive simulations and analysis, we demonstrate that the proposed ONoC significantly reduces the setup time, and then the packet latency.
In order to combine the power of simulation-based and formal techniques, semi-formal methods have been widely explored. Among these methods, abstraction-guided simulation is a quite promising one. In this paper, we pr...
详细信息
ISBN:
(纸本)9783981080162
In order to combine the power of simulation-based and formal techniques, semi-formal methods have been widely explored. Among these methods, abstraction-guided simulation is a quite promising one. In this paper, we propose an abstraction-guided simulation approach aiming to cover hard-to-reach states in functional verification of microprocessors. A Markov model is constructed utilizing the high level functional specification, i.e. ISA. Such model integrates vector correlations. Furthermore, several strategies utilizing abstraction information are proposed as an effective guidance to the test generation. Experimental results on two complex microprocessors show that our approach is more efficient in covering hard-to-reach states than similar methods. Comparing with some work with other intelligent engines, our approach could guarantee higher hit ratio of target states without efficiency loss.
暂无评论