检索结果-内蒙古大学图书馆

An energy-efficient scheduling approach based on private clouds

Journal of Information and Computational Science 2011年第4期8卷 716-724页

作者： Li, Jiandun Peng, Junjie Lei, Zhou Zhang, Wu School of Computer Engineering and Science Shanghai University Shanghai 200072 China Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy of Science Beijing 100190 China

With further development and wide acceptance of cloud computing, lots of companies and colleges decide to take advantage of it in their own data centers, which is known as private clouds. Since private clouds have some unique characteristics and special requirements, it is still a challenging problem to effectively schedule virtual machine requests onto compute nodes, especially with multiple objectives to meet. In this paper, we explore couples of characteristics related to workflow scheduling in the scenario of private clouds and propose a hybrid energy-efficient scheduling approach. The experiments show that it can save more time for users, conserve more energy and achieve higher level of load balancing. Copyright © 2011 Binary Information Press.

关键词： Cloud computing

来源：评论

学校读者我要写书评

暂无评论

A scheduling algorithm for private clouds

引用

Journal of Convergence Information technology 2011年第7期6卷 1-9页

作者： Li, Jiandun Peng, Junjie Zhang, Wu School of Computer Engineering and Science Shanghai University Shanghai 200072 China Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy of Science Beijing 100190 China

In contrast with public clouds, private clouds have some unique features, especially when related to workflow scheduling. Of course, the tradeoff problem between power and performance remains to be one of the key concerns. Based on our previous research, in this paper, we propose a hybrid energy-efficient scheduling algorithm using dynamic migration. The experiments show that it can not only reduce the response time, conserve more energy, but also achieve higher level of load balancing.

关键词： Scheduling algorithms

来源：评论

学校读者我要写书评

暂无评论

Parallelizing a Machine Translation Decoder for Multicore computer

Parallelizing a Machine Translation Decoder for Multicore Co...

引用

2011 Seventh International Conference on Natural Computation(第七届自然计算国际会议 ICNC 2011)

作者： Zhaoqing Zhang Haitao Mi Long Chen Xiaobing Feng Wei Huo Zhiyuan Li Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academ Department of Computer Science Purdue University West Lafayette US IN 47907

Machine translation (MT), with its broad potential use, has gained increased attention from both researchers and software vendors. To generate high quality translations, however, MT decoders can be highly computation intensive. With significant raw computing power, multi-core microprocessors have the potential to speed up MT software on desktop machines. However, retrofitting existing MT decoders is a nontrivial issue. Race conditions and atomicity issues are among those complications making parallelization difficult. In this article, we show that, to parallelize a state-of-the-art MT decoder, it is much easier to overcome such difficulties by using a process-based parallelization method, called functional task parallelism, than using conventional thread-based methods. We achieve a 7.60 times speed up on an 8-core desktop machine while making significantly less changes to the original sequential code than required by using multiple threads.

关键词： Decoding Instruction sets Load modeling Computational modeling Memory management Sorting

来源：评论

学校读者我要写书评

暂无评论

PartitionSim: A parallel simulator for many-cores

引用

Jisuanji Xuebao/Chinese Journal of computers 2011年第11期34卷 2084-2092页

作者： Jiao, Shuai Xu, Wei-Zhi Tang, Shi-Bin Fan, Dong-Rui Sun, Ning-Hui Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Acad. of Sci. Beijing 100190 China Graduate University of Chinese Acad. of Sci. Beijing 100049 China

This paper introduces PartitionSim, a parallel simulator for future thousand-core processors with software-managed cache coherence. The purpose of PartitionSim is to improve the simulation performance of many-core architectures at the expense of little accuracy sacrifice. To achieve this goal, we propose a novel technique: timing partition. Timing partition is based on such an observation: in a target system, interacting components communicate with each other and impose simulation synchronization while non-interacting components don't communicate with each other and allow asynchronous simulation. It divides the target timing models into two groups: non-interacting group and interacting group. Non-interacting timing models are simulated by host threads that synchronize little with each other to improve speed and hurt little accuracy, while interacting timing models are simulated by host threads that synchronize strictly with each other to preserve accuracy. Using PartitionSim, We have simulated a target composed of thousands of cores on a 16-core SMP machine. The evaluation results show that PartitionSim scales well with near linear speedup and has considerable performance (up to 25MIPS) at the expense of little accuracy sacrifice (average 0.92%).

关键词： Timing circuits

来源：评论

学校读者我要写书评

暂无评论

An Efficient Shared Memory Based Virtual Communication system for Embedded SMP Cluster

An Efficient Shared Memory Based Virtual Communication Syste...

引用

International Conference on Networking, architecture, and Storage (NAS)

作者： Wenxuan Yin Xiang Gao Xiaojing Zhu Deyuan Guo Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy and Sciences Beijing China Institute of Microelectronics Tsinghua University Beijing China

With the prevalence of multi-core processors, it is a trend that the embedded cluster deploys SMP nodes to gain more computing power. As a crucial issue, the MPI inter-process communication has been suffering the contradiction between high performance and embedded constraints. Moreover, there is a big performance gap between intra- and inter-node communication for different infrastructures. In this paper, we design a virtual communication system called SMVN, which extends the shared memory mechanism typically used in intra-node case into the inter-node case. The SMVN utilizes the HT inter-chip interconnect interface in Godson-3A SMP nodes to build a mesh topology. It is Ethernet compatible by simulating bottom layers of TCP/IP protocol. With the design, the node interconnection can get rid of NICs, cables and switches. Furthermore, we exploit the zero-copy scheme and other optimizations to improve the performance. We port the MPICH2 library by socket channel and formulate its process allocation. The MPI latency and bandwidth tests show that the performance difference between two levels is small. The inter-node bandwidth is 27.3 MB/s, which is more than twice the theoretical peak value of 100 Mb Ethernet and reaches 84% of the intra-node performance.

关键词： Program processors IP networks Protocols computer architecture Bandwidth Resource management Synchronization

来源：评论

学校读者我要写书评

暂无评论

Transparent dynamic binding with fault-tolerant cache coherence protocol for chip multiprocessors

Transparent dynamic binding with fault-tolerant cache cohere...

引用

International Conference on Dependable systems and Networks (DSN)

作者： Shuchang Shan Yu Hu Xiaowei Li Chinese Academy and Sciences Beijing China Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy and Sciences Beijing China

Aggressive technology scaling causes chip multiprocessors increasingly error-prone. Core-level fault-tolerant approaches bind two cores to implement redundant execution and error detection. However, along with more cores integrated into one chip, existing static and dynamic binding schemes suffer from the scalability problem when considering the violation effects caused by external write operations. In this paper, we present a transparent dynamic binding (TDB) mechanism to address the issue. Learning from static binding schemes, we involve the private caches to hold identical data blocks, thus we reduce the global masters-lave consistency maintenance to the scale of the private caches. With our fault-tolerant cache coherence protocol, TDB satisfies the objective of private cache consistency, therefore provides excellent scalability and flexibility. Experimental results show that, for a set of parallel workloads, the overall performance of our TDB scheme is very close to that of baseline fault-tolerant systems, outperforming dynamic core coupling by 9.2%, 10.4%, 18% and 37.1% when considering 4, 8, 16 and 32 cores respectively.

关键词： Master-slave Fault tolerance Fault tolerant systems Coherence system-on-a-chip Pipelines

来源：评论

学校读者我要写书评

暂无评论

PADS: A Pattern-Driven Stencil Compiler-Based Tool for Reuse of Optimizations on GPGPUs

PADS: A Pattern-Driven Stencil Compiler-Based Tool for Reuse...

引用

International Conference on Parallel and Distributed systems (ICPADS)

作者： Dongni Han Shixiong Xu Li Chen Lei Huang Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy of Sciences China Computer Science Department Prairie View A&M University USA

Stencil computations are core of wide range of scientific and engineering applications. A lot of efforts have been put into improving efficiency of stencil calculations on different platforms, but unfortunately it is not easy to reuse. In this paper we present a PAttern-Driven Stencil compiler-based tool and a simple tuning system to reuse those well optimized methods and codes. We also suggest extensions to OpenMP, depicting high-level data structures in order to facilitate recognition of various stencil computation patterns. The PADS allows programmers to rewrite kernel of stencils or reuse source-to-source translator outputs as optimized stencil template codes with related tuning parameters, In addition, PADS consists of a OpenMP to CUDA translator and code generator using optimized template codes. It also obtains architecture-specific parameters to tune stencils across different GPU platforms. To demonstrate our system flexibility and performance portability, we illustrate four different stencil computations, Laplacian operator with Jacobi iterative method, divergence operator, 3 dimension 25 point stencil and a 2D heat equation using ADI method with periodic boundary conditions. PADS succeeds in generating all these four stencil codes using different optimization strategies and delivers a promising performance improvement.

关键词： Pattern matching Kernel Tuning Libraries Optimization Generators

来源：评论

学校读者我要写书评

暂无评论

Empirical design bugs prediction for verification

Empirical design bugs prediction for verification

引用

Design, Automation and Test in Europe Conference and Exhibition

作者： Qi Guo Tianshi Chen Haihua Shen Yunji Chen Yue Wu Weiwu Hu Chinese Academy of Sciences Beijing Beijing CN Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy and Sciences Beijing China

Coverage model is the main technique to evaluate the thoroughness of dynamic verification of a Design-under-Verification (DUV). However, rather than achieving a high coverage, the essential purpose of verification is to expose as many bugs as possible. In this paper, we propose a novel verification methodology that leverages the early bug prediction of a DUV to guide and assess related verification process. To be specific, this methodology utilizes predictive models built upon artificial neural networks (ANNs), which is capable of modeling the relationship between the high-level attributes of a design and its associated bug information. To evaluate the performance of constructed predictive model, we conduct experiments on some open source projects. Moreover, we demonstrate the usability and effectiveness of our proposed methodology via elaborating experiences from our industrial practices. Finally, discussions on the application of our methodology are presented.

关键词： computer bugs Predictive models Complexity theory Measurement Training data Training Correlation

来源：评论

学校读者我要写书评

暂无评论

Cross-layer optimized placement and routing for FPGA soft error mitigation

Cross-layer optimized placement and routing for FPGA soft er...

引用

Design, Automation and Test in Europe Conference and Exhibition

作者： Keheng Huang Yu Hu Xiaowei Li Chinese Academy of Sciences Beijing Beijing CN Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy and Sciences Beijing China

As the feature size of FPGA shrinks to nanometers, soft errors increasingly become an important concern for SRAM-based FPGAs. Without consideration of the application level impact, existing reliability-oriented placement and routing approaches analyze soft error rate (SER) only at the physical level, consequently completing the design with suboptimal soft error mitigation. Our analysis shows that the statistical variation of the application level factor is significant. Hence in this work, we first propose a cube-based analysis to efficiently and accurately evaluate the application level factor. And then we propose a cross-layer optimized placement and routing algorithm to reduce the SER by incorporating the application level and the physical level factor together. Experimental results show that, the average difference of the application level factor between our cube-based method and Monte Carlo golden simulation is less than 0.01. Moreover, compared with the baseline VPR placement and routing technique, the cross-layer optimized placement and routing algorithm can reduce the SER by 14% with no area and performance overhead.

关键词： Routing Wires Circuit faults Field programmable gate arrays Monte Carlo methods Algorithm design and analysis Accuracy

来源：评论

学校读者我要写书评

暂无评论

A computer-supported collaborative learning platform based on clouds

引用

Journal of Computational Information systems 2011年第11期7卷 3811-3818页

作者： Li, Jiandun Peng, Junjie Zhang, Wu Han, Fangfang Yuan, Qin Joint Lab of Cloud Computing School of Computer Engineering and Science Shanghai University Shanghai 200072 China Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing 100190 China

computer-supported collaborative learning (CSCL) is an emerging branch of learning science concerned with studying how people can learn together with the help of computers. As an indispensable ingredient, computer mediation evolves through couples of phases, e.g. centralized server, peer-to-peer network, grid computing. Nevertheless, with daily rising trend on requirement's dynamic changes in service, existing models fall short to respond on demand. Thus, in this paper, by taking advantage of cloud computing, we propose a feasible CSCL platform and the experiments show that it can not only fulfill the basic requirement of CSCL, but also respond to learner's dynamic need on demand. Copyright © 2011 Binary Information Press.

关键词： Peer to peer networks

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：