检索结果-内蒙古大学图书馆

IEEE International Conference on computer Research and Development

作者： Li, Yu-Gang Zhao, Kun Huang, Yu-Qing Qiu, Zhen-Ge Liu, Zhi-Yong Beijing Lab. of Intelligent Information Technology School of Computer Science BIT Beijing 100081 China Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing 100081 China Institute of Computing Technology Chinese Academy Sciences Beijing China

ISBN: (纸本)9781612848372

Ray-tracing, can produce high-quality images, however, the use of ray-tracing has been limited due to its high demands on computational power and memory bandwidth, especially in the case of satellite imagery. In this paper, we propose a scalable parallel ray tracing algorithm of satellite imagery on a cluster of multi-core architecture. The algorithm combines demand-driven and data-driven models in order to reduce communication on the cost of maintaining redundant data. In order to make the most of computer's parallelism and increase computational efficiency, it combines two levels of parallelisms, the TLP parallelism brought by the many core architecture and the inter-node parallelism via MPI and OpenMP. Experiment results show that the algorithm is highly scalability. © 2011 IEEE.

关键词： Satellite imagery

来源：评论

学校读者我要写书评

暂无评论

A domain partition model approach to the online fault recovery of FPGA-based reconfigurable systems

A domain partition model approach to the online fault recove...

引用

作者： Shang, Lihong Zhou, Mi Hu, Yu Yang, Erfu School of Computer Science and Engineering Beihang University Beijing 100191 China Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing 100190 China Department of Electronic and Electrical Engineering University of Strathclyde Glasgow Gl 1XW United Kingdom

Field programmable gate arrays (FPGAs) are widely used in reliability-critical systems due to their reconfiguration ability. However, with the shrinking device feature size and increasing die area, nowadays FPGAs can be deeply affected by the errors induced by electromigration and radiation. To improve the reliability of FPGA-based reconfigurable systems, a permanent fault recovery approach using a domain partition model is proposed in this paper. In the proposed approach, the fault-tolerant FPGA recovery from faults is realized by reloading a proper configuration from a pool of multiple alternative configurations with overlaps. The overlaps are presented as a set of vectors in the domain partition model. To enhance the reliability, a technical procedure is also presented in which the set of vectors are heuristically filtered so that the corresponding small overlaps can be merged into big ones. Experimental results are provided to demonstrate the effectiveness of the proposed approach through applying it to several benchmark circuits. Compared with previous approaches, the proposed approach increased MTTF by up to 18.87%. Copyright © 2011 The Institute of Electronics, Information and Communication Engineers.

关键词： Fault tolerance

来源：评论

学校读者我要写书评

暂无评论

一款X86架构处理器保护模式及长模式下复杂指令的验证

一款X86架构处理器保护模式及长模式下复杂指令的验证

引用

第七届中国测试学术会议

作者： Hao Shuai 郝帅 Lv Tao 吕涛 Li Xiao Wei 李晓维 State Key Laboratory of Computer System and Architecture Chinese Academy of Sciences Beijing 10019 中国科学院计算技术研究所计算机体系结构国家重点实验室北京100190

本文对一款X86架构处理器的复杂指令在保护模式以及长模式下进行了功能验证。通过对X86架构处理器运行模式进行分析，搭建了进入保护模式以及长模式(长模式包括兼容模式及64bit模式)的模板，为指令集在这3种模式下进行功能验证奠定基础... 详细信息

本文对一款X86架构处理器的复杂指令在保护模式以及长模式下进行了功能验证。通过对X86架构处理器运行模式进行分析，搭建了进入保护模式以及长模式(长模式包括兼容模式及64bit模式)的模板，为指令集在这3种模式下进行功能验证奠定基础。基于对X86指令访存次数及访存难度的研究，将指令集分为一般指令、一般复杂指令与复杂指令，并对复杂指令：CALL FAR、JUMP FAR、RETURN FAR、INT和IRET进行了定向功能验证，编写测试激励800余条，实现功能点覆盖率100％，发现设计错误32处。

关键词：集成电路测试复杂指令运行模式功能验证

来源：评论

学校读者我要写书评

暂无评论

Computation Pattern Driven Reuse of Manual Optimizations for GPGPUs

Computation Pattern Driven Reuse of Manual Optimizations for...

引用

IEEE International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)

作者： Shixiong Xu Dongni Han Li Chen Key Laboratory of Computer System and Architecture Institute of Computing Technology Graduate University of Chinese Academy of Sciences Beijing China

The wide application of General Purpose Graphic Processing Units (GPGPUs) results in large manual efforts on porting and optimizing algorithms on them. However, most existing automatic ways of generating GPGPU code fail to conduct optimization strategies regarding a specific computation and to reuse constantly evolving manual optimizations. In this paper, we present a computation pattern driven approach for computation-specific GPGPU code generation and optimization, which in turn reuses manual optimizations to a certain extent. We suggest language extensions to OpenMP, high-level data structure attributes, in order to assist the process of computation pattern matching and to help give users intuitive performance tuning parameters in the view of data structure attributes. We illustrate the feasibility of this approach through three important computation dwarfs, which are dense matrix, sparse matrix, and structured mesh computation in scientific computing. We also build a prototype OpenMP-to-CUDA translator that consists of computation pattern recognition and code template instantiation. The experimental results demonstrate the performance benefits of computation pattern driven method. To our best knowledge, it is the first work on reusing manual optimizations for GPGPUs with computation pattern driven approach.

关键词： Data structures Optimization Pattern matching Sparse matrices Manuals Libraries Tuning

来源：评论

学校读者我要写书评

暂无评论

An efficient shared memory based virtual communication system for embedded SMP cluster

An efficient shared memory based virtual communication syste...

引用

IEEE International Conference on Networking, architecture and Storage

作者： Yin, Wenxuan Gao, Xiang Zhu, Xiaojing Guo, Deyuan Graduate University of Chinese Academy of Sciences Beijing China Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing China Loongson Technology Corporation Limited Beijing China Institute of Microelectronics Tsinghua University Beijing China

ISBN: (纸本)9780769545097

With the prevalence of multi-core processors, it is a trend that the embedded cluster deploys SMP nodes to gain more computing power. As a crucial issue, the MPI interprocess communication has been suffering the contradiction between high performance and embedded constraints. Moreover, there is a big performance gap between intra- and inter-node communication for different infrastructures. In this paper, we design a virtual communication system called SMVN, which extends the shared memory mechanism typically used in intra-node case into the inter-node case. The SMVN utilizes the HT inter-chip interconnect interface in Godson-3A SMP nodes to build a mesh topology. It is Ethernet compatible by simulating bottom layers of TCP/IP protocol. With the design, the node interconnection can get rid of NICs, cables and switches. Furthermore, we exploit the zero-copy scheme and other optimizations to improve the performance. We port the MPICH2 library by socket channel and formulate its process allocation. The MPI latency and bandwidth tests show that the performance difference between two levels is small. The inter-node bandwidth is 27.3MB/s, which is more than twice the theoretical peak value of 100Mb Ethernet and reaches 84% of the intra-node performance. © 2011 IEEE.

关键词： Bandwidth

来源：评论

学校读者我要写书评

暂无评论

基于数据通路分片和冗余策略的片上容错路由器设计

基于数据通路分片和冗余策略的片上容错路由器设计

引用

第七届中国测试学术会议

作者： Lu Hang 路航 Han Yinhe 韩银和 Wang Ying 王颖 Li Xiaowei 李晓维 State Key Laboratory of Computer System and Architecture Chinese Academy of Sciences Beijing 10019 计算机系统结构国家重点实验室北京 100190 中国科学院计算技术研究所北京100190

为了满足系统芯片对通信带宽的要求,片上网络逐渐成为多核处理器互连的主流方案。然而随着集成电路技术进入纳米时代,由于制造缺陷、电粒子轰击等原因,片上网络出现故障的可能性明显增大,严重时会导致整个系统崩溃。以往的容错技术允许... 详细信息

为了满足系统芯片对通信带宽的要求,片上网络逐渐成为多核处理器互连的主流方案。然而随着集成电路技术进入纳米时代,由于制造缺陷、电粒子轰击等原因,片上网络出现故障的可能性明显增大,严重时会导致整个系统崩溃。以往的容错技术允许片上路由器分时复用其它正常工作的数据通路分片,以提高路由器的容错能力,降低传输延迟。然而,随着故障分片数量增多,路由器的可靠性和性能会受到严重影响。本文基于数据通路部件可分片的特点,对分片进行冗余备份,当分片出现故障时用冗余分片替代故障分片以提高路由器的可靠性。实验结果表明,本文提出的容错路由器硬件开销小于双模冗余,SPF达到12以上,当网络存在一定数量的故障时,仍有98％以上的节点可以正常工作。

关键词：集成电路片上网络容错路由器设计流程数据通路冗余备份

来源：评论

学校读者我要写书评

暂无评论

An energy-efficient scheduling approach based on private clouds

引用

Journal of Information and Computational Science 2011年第4期8卷 716-724页

作者： Li, Jiandun Peng, Junjie Lei, Zhou Zhang, Wu School of Computer Engineering and Science Shanghai University Shanghai 200072 China Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy of Science Beijing 100190 China

With further development and wide acceptance of cloud computing, lots of companies and colleges decide to take advantage of it in their own data centers, which is known as private clouds. Since private clouds have some unique characteristics and special requirements, it is still a challenging problem to effectively schedule virtual machine requests onto compute nodes, especially with multiple objectives to meet. In this paper, we explore couples of characteristics related to workflow scheduling in the scenario of private clouds and propose a hybrid energy-efficient scheduling approach. The experiments show that it can save more time for users, conserve more energy and achieve higher level of load balancing. Copyright © 2011 Binary Information Press.

关键词： Cloud computing

来源：评论

学校读者我要写书评

暂无评论

A scheduling algorithm for private clouds

引用

Journal of Convergence Information Technology 2011年第7期6卷 1-9页

作者： Li, Jiandun Peng, Junjie Zhang, Wu School of Computer Engineering and Science Shanghai University Shanghai 200072 China Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy of Science Beijing 100190 China

In contrast with public clouds, private clouds have some unique features, especially when related to workflow scheduling. Of course, the tradeoff problem between power and performance remains to be one of the key concerns. Based on our previous research, in this paper, we propose a hybrid energy-efficient scheduling algorithm using dynamic migration. The experiments show that it can not only reduce the response time, conserve more energy, but also achieve higher level of load balancing.

关键词： Scheduling algorithms

来源：评论

学校读者我要写书评

暂无评论

Parallelizing a Machine Translation Decoder for Multicore computer

Parallelizing a Machine Translation Decoder for Multicore Co...

引用

2011 Seventh International Conference on Natural Computation(第七届自然计算国际会议 ICNC 2011)

作者： Zhaoqing Zhang Haitao Mi Long Chen Xiaobing Feng Wei Huo Zhiyuan Li Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academ Department of Computer Science Purdue University West Lafayette US IN 47907

Machine translation (MT), with its broad potential use, has gained increased attention from both researchers and software vendors. To generate high quality translations, however, MT decoders can be highly computation intensive. With significant raw computing power, multi-core microprocessors have the potential to speed up MT software on desktop machines. However, retrofitting existing MT decoders is a nontrivial issue. Race conditions and atomicity issues are among those complications making parallelization difficult. In this article, we show that, to parallelize a state-of-the-art MT decoder, it is much easier to overcome such difficulties by using a process-based parallelization method, called functional task parallelism, than using conventional thread-based methods. We achieve a 7.60 times speed up on an 8-core desktop machine while making significantly less changes to the original sequential code than required by using multiple threads.

关键词： Decoding Instruction sets Load modeling Computational modeling Memory management Sorting

来源：评论

学校读者我要写书评

暂无评论

A Priority-Aware NoC to Reduce Squashes in Thread Level Speculation for Chip Multiprocessors

A Priority-Aware NoC to Reduce Squashes in Thread Level Spec...

引用

International Symposium on Parallel and Distributed Processing with Applications, ISPA

作者： Wenbo Dai Hong An Qi Li Gongming Li Bobin Deng Shilei Wu Xiaomei Li Yu Liu School of Computer Science and Technology University of Science and Technology Hefei China Key Laboratory of Computer System and Architecture Chinese Academy and Sciences Beijing China

Thread Level Speculation (TLS) is a technique aims at boosting the performance of sequential programs running on Chip Multiprocessors (CMPs) by automatically parallelizing them. It exempts programmers from the heavy task of parallel programming. But its performance may suffer from frequent squashing caused by inter-thread data dependency violation. In this paper, we propose a Network-on-Chip (NoC) in CMP that employs a priority-aware packet arbitration policy. Packet scheduling guided by such policy reduces the occurrence of TLS squashes. Simulation results with 5 applications show that our policy reduces squashes by 22% in best case and 15% on average. Moreover, our priority aware approach could be generalized to similar scenarios in which different threads running on CMP manifest different priorities.

关键词： Instruction sets Nickel Art Switches Scalability Protocols

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：