检索结果-内蒙古大学图书馆

WASE International Conference on Information Engineering

作者： Zang, Hongyong Gu, Kuiyan Sun, Yuzhong Meng, Dan Institute of Computing Technology Chinese Academy of Sciences Beijing 100080 China Graduate University of Chinese Academy of Sciences Beijing 100049 China Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing 100080 China Dongxin Geology Research Institute of Shengli Oilfield Sinopec Dongying Shandong 257094 China

ISBN: (纸本)9780769540801

Virtual machine technology can provide high server utilization and service consolidation on an individual physical machine, and gains acceptance in diverse fields. In a growing number of contexts, many situations require high-performance network virtualization. Paravirtualized network system adopts split driver model, and uses dynamic shared memory mechanism for communication between unprivileged guest domain and isolated driver domain. The dynamic shared memory mechanism introduces extra per-packet overheads, such as additional hypercalls and grant table operations. In this paper, we have implemented a new method called STAMP (static shared memory pipe), which uses static shared memory mechanism on Xen platform. STAMP comprises a two-way lockless producer-consumer circular buffer to carry network packets between split drivers. Once STAMP is established between guest domain and isolated driver domain, all packets are transferred through it to avoid extra overheads introduced by dynamic shared memory mechanism. In our evaluations, STAMP achieves availability and high efficiency, especially for scenarios of small messages. These results make the static shared memory mechanism an attractive solution for network paravirtualization. © 2010 IEEE.

关键词： Virtual machine

来源：评论

学校读者我要写书评

暂无评论

On generation of a universal path candidate set containing testable long paths

On generation of a universal path candidate set containing t...

引用

IEEE International Test Conference

作者： Zijian He Tao Lv Huawei Li Xiaowei Li Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy and Sciences Beijing China

We propose an efficient algorithm on generation of a universal path candidate set U that contains testable long paths for delay testing. Some strategies are presented to speed up the depth-first search procedure of U generation, targeting the reduction of checking times of sensitization criteria. Experimental results illustrate that our approach achieves an 8X speedup on average in comparison with the traditional depth-first search approach.

关键词： Automatic test pattern generation

来源：评论

学校读者我要写书评

暂无评论

A multi-FPGA based platform for emulating a 100m-transistor-scale processor with high-speed peripherals (abstract only) 10

A multi-FPGA based platform for emulating a 100m-transistor-...

引用

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays

作者： Huandong Wang Xiang Gao Yunji Chen Dan Tang Weiwu Hu Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing China

ISBN: (纸本)9781605589114

This paper describes a multi-FPGA based platform for emulating the Loongson-2G micro-processor on different mother boards. This platform is developed targeting at verification and evaluation of the Loongson-2G micro-processor, which is the next generation of Loongson-2 family, composed by one four-issue, out-of-order execution way 64-bit MIPS-compatible processor core named GS464, one 1M byte secondary Cache, one HyperTransport IO interface, one DDR2/3 memory interface and some other low speed IO interfaces. Most parts of this micro-process are mapped into the multi-FPGA based platform which consists two Vertex-5 330 FPGA chips. Semi-custom partitioning tactics within the entire design flow are developed to synthesize the whole designed into the multi-FPGA based platform. Modifications in architectural level are applied to the original architecture of the chip, in order to make it easy to be partitioned into two parts. High speed SEDES of HyperTransport IO link and DDR2/3 memory interface are emulated by using several clocks with different clock phases. To resolve the problem that hard to debug in FPGA system, a method by software probe with help of injected hardware modules in FPGA is developed and used to debug the problem causing by behavior mismatching between the ASIC ram block and the FPGA ram block. Some evaluation work on performance of Loongson-2G is done on this multi-FPGA based platform as pre-silicon test. To the authors' knowledge, there has been no previous work on such a big design used for verification and evaluation.

关键词： multi-fpga emulation loongson verification evaluation fpga

来源：评论

学校读者我要写书评

暂无评论

Using Index in the MapReduce Framework

Using Index in the MapReduce Framework

引用

International Asia-Pacific Web Conference (APWEB)

作者： Mingyuan An Yang Wang Weiping Wang Key Laboratory of Computer System and Architecture Chinese Academy and Sciences Beijing China Institute of Computing Technology Chinese Academy and Sciences Beijing China Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy and Sciences Beijing China

MapReduce is a programming framework introduced by Google for large-scale data processing. It is usually used in a scan-centric fashion where all the data are split into blocks and Maps are generated for each block to scan and process the data in the block, then Reduces merge outputs from all the Maps. When a query intends to process only a subset of the data selected by a predicate, this brute-force method may cause extra I/O overhead spent on irrelevant data, and the overhead for initiating so many Maps may be non-trivial given that the actually interesting data for the query is comparatively small in volume. We propose an approach to integrate the index into the MapReduce execution in which only an appropriate number of Maps are generated, each of which accesses the data using an index. This approach incurs random I/O and remote access to data, so the overall performance depends on both system parameters and the query characteristics. We build a cost model for both this index access execution and the traditional full scan execution. This cost model can be used to choose between the two execution modes before executing a query. Experiments show that the index access execution can greatly outperform full scan execution when the selectivity of the predicate is low, and the cost model predicts the actual execution cost very well so can be used to determine the execution plan for a query.

关键词： Flash memory Indexing computer science Delay Tree data structures Energy efficiency Nonvolatile memory Costs Mechanical factors Energy storage

来源：评论

学校读者我要写书评

暂无评论

A heuristic algorithm for optimizing page selection instructions

A heuristic algorithm for optimizing page selection instruct...

引用

2010 2nd International Conference on Software technology and Engineering, ICSTE 2010

作者： Li, Qing'an He, Yanxiang Chen, Yong Wu, Wei Xu, Wenwen School of Computer Wuhan University Wuhan 430072 China State Key Laboratory of Software Engineering Wuhan University Wuhan 430072 China Institute of Computing Technology Chinese Academy of Sciences Beijing 100190 China

ISBN: (纸本)9781424486656

Page switching is a technique that increases the memory in microcontrollers without extending the address buses. This technique is widely used in the design of 8-bit MCUs. In this paper, we present an algorithm to reduce the overhead of page switching. To pursue small code size, we place the emphasis on the allocation of functions into suitable pages with a heuristic algorithm, thereby the cost-effective placement of page selection instructions. Our experimental results showed the optimization achieved a reduction in code size of 13.2 percent. © 2010 IEEE.

关键词： Cost effectiveness

来源：评论

学校读者我要写书评

暂无评论

Performance-asymmetry-aware topology virtualization for defect-tolerant NoC-based many-core processors 10

Performance-asymmetry-aware topology virtualization for defe...

引用

Design, Automation and Test in Europe Conference and Exhibition

作者： Lei Zhang Yue Yu Jianbo Dong Yinhe Han Shangping Ren Xiaowei Li Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy and Sciences China Department of Computer Science Illinois Institute of Technology USA

ISBN: (纸本)9783981080162

Topology virtualization techniques are proposed for NoC-based many-core processors with core-level redundancy to isolate hardware changes caused by on-chip defective cores. Prior work focuses on homogeneous cores with symmetric performance and optimizes on-chip communication only. However, core-to-core performance asymmetry due to manufacturing process variations poses new challenges for constructing virtual topologies. Lower performance cores may scatter over a virtual topology, while operating systems typically allocate tasks to continuous cores. As a result, parallel applications are probably assigned to a region containing many slower cores that become bottlenecks. To tackle the above problem, in this paper we present a novel performance-asymmetry-aware reconfiguration algorithm Bubble-Up based on a new metric called core fragmentation factor (CFF). Bubble-Up can arrange cores with similar performance closer, yet maintaining reasonable hop distances between virtual neighbors, thus accelerating applications with higher degree of parallelism, without changing existing allocation strategies for OS. Experimental results show its effectiveness.

关键词： Network topology Network-on-a-chip Programming profession Hardware System-on-a-chip Manufacturing processes Scattering Operating systems Parallel processing Throughput

来源：评论

学校读者我要写书评

暂无评论

Outlier Detection for Learning-Based Optimizing Compiler

Outlier Detection for Learning-Based Optimizing Compiler

引用

Japan-China Joint Workshop on Frontier of computer Science and technology (FCST)

作者： Shun Long Weiheng Zhu Department of Computer Science Jinan University Guangzhou China Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy and Sciences China

Modern compilers use machine learning to find from their prior experience useful heuristics for new programs encountered in order to accelerate the optimization process. However, prior experience might not be applicable for outlier programs with unfamiliar code features. This paper presents a Reverse K-nearest neighbor (RKNN) algorithm based approach for outlier detection. The compiler can therefore launch a search within an optimization space when outlier programs are encountered, or directly apply its experience to non-outliers. Preliminary experimental results demonstrate the effectiveness of the approach.

关键词： Optimization Program processors Training Kernel Benchmark testing Arrays Nearest neighbor searches

来源：评论

学校读者我要写书评

暂无评论

On synchronization and evaluation method of chipped many-core processor

引用

Jisuanji Xuebao/Chinese Journal of computers 2010年第10期33卷 1777-1787页

作者： Xu, Wei-Zhi Song, Feng-Long Liu, Zhi-Yong Fan, Dong-Rui Yu, Lei Zhang, Shuai Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Acad. of Sci. Beijing 100190 China Graduate University of Chinese Acad. of Sci. Beijing 100039 China

Synchronization schemes are critical for on-chip multi-core and many-core processor to execute correctly and communicate cooperatively. The efficiency of the synchronization is very important for the processor. In this paper, for on-chip many-core architecture, three types of synchronization schemes are proposed. That is, two types of coarse-grain synchronization schemes based on dedicated hardware support and atomic operation, and a fine-grain synchronization scheme based on Full/Empty bit. Then, the evaluation criterions and methods are proposed, in which quantitative evaluation micro-benchmarks are designed for coarse-grain synchronization schemes. Finally, the coarse-grain synchronization schemes are evaluated via a many-core architecture simulator, i.e., Godson-T, and AMD Opteron commercial on-chip multi-processor using pThread multi-thread program model. The results show that hardware support improves the performance of the synchronization obviously for on-chip many-core processor, and the performance loss of the traditional synchronization scheme based on atomic instructions is caused by the waiting cost of load imbalance and serialization on synchronization point mostly.

关键词： Synchronization

来源：评论

学校读者我要写书评

暂无评论

Accelerating Lightpath setup via broadcasting in binary-tree waveguide in Optical NoCs 10

Accelerating Lightpath setup via broadcasting in binary-tree...

引用

Design, Automation and Test in Europe Conference and Exhibition

作者： Binzhang Fu Yinhe Han Huawei Li Xiaowei Li Chinese Academy of Sciences Beijing China Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy and Sciences Beijing China

ISBN: (纸本)9783981080162

In this paper, we propose a binary-tree waveguide connected Optical-Network-on-Chip (ONoC) to accelerate the establishment of the lightpath. By broadcasting the control data in the proposed power-efficient binary-tree waveguide, the maximal hops for establishing lightpath is reduced to two. With extensive simulations and analysis, we demonstrate that the proposed ONoC significantly reduces the setup time, and then the packet latency.

关键词： Optical waveguides Acceleration Broadcasting Network-on-a-chip Optical interconnections Delay Optical modulation Logic Microcavities Optical devices

来源：评论

学校读者我要写书评

暂无评论

An abstraction-guided simulation approach using Markov models for microprocessor verification 10

An abstraction-guided simulation approach using Markov model...

引用

Design, Automation and Test in Europe Conference and Exhibition

作者： Tao Zhang Tao Lv Xiaowei Li Chinese Academy of Sciences Beijing China Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy and Sciences Beijing China

ISBN: (纸本)9783981080162

In order to combine the power of simulation-based and formal techniques, semi-formal methods have been widely explored. Among these methods, abstraction-guided simulation is a quite promising one. In this paper, we propose an abstraction-guided simulation approach aiming to cover hard-to-reach states in functional verification of microprocessors. A Markov model is constructed utilizing the high level functional specification, i.e. ISA. Such model integrates vector correlations. Furthermore, several strategies utilizing abstraction information are proposed as an effective guidance to the test generation. Experimental results on two complex microprocessors show that our approach is more efficient in covering hard-to-reach states than similar methods. Comparing with some work with other intelligent engines, our approach could guarantee higher hit ratio of target states without efficiency loss.

关键词： Microprocessors Testing Computational modeling Engines Data mining computer simulation Power system modeling Laboratories computer architecture Instruction sets

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：