检索结果-内蒙古大学图书馆

IEEE Conference on Wireless Communications and Networking

作者： Yu Gu Baohua Zhao Yusheng Ji Jie Li Information Systems Architecture Science Research Division National Institute of Information Tokyo Japan State Key Laboratory of Networking and Switching Technology Beijing China Department of Computer Science Tsukuba Science City University of Tsukuba Ibaraki Japan

We address the optimal sink scheduling problem in wireless sensor networks (WSNs). The problem is inherently difficult since sink scheduling and data routing are tightly coupled. Previous approaches either have questionable performance due to no joint considerations, or are based on relaxed constraints. Our aim is to fill in this blank in the research. First, by discretizing continuous time, we develop a novel bound technique to connect time-varying routes with the placement of sinks. This bounding technique transforms time-related constraints into pattern-based ones and allows us to mathematically formulate this optimization in a pattern-based way. The complexity of directly solving this optimization is intractable; therefore, on the basis of column generation (CG), a computationally efficient algorithm is developed to reduce the complexity by decomposing the problem into sub-problems and iteratively solving them to approach optimality. Simulations demonstrate the efficiency of the algorithm and substantiate the importance of sink mobility in energy-constrained sensor networks.

关键词： Wireless sensor networks Optimization Relays Routing Processor scheduling Complexity theory USA Councils

来源：评论

学校读者我要写书评

暂无评论

Design and implementation of communication system of the Dawning 6000 supercomputer

引用

中国计算机科学前沿 2010年第4期4卷 466-474页

作者： Qiang LI Bo LI Zhigang HUO Ninghui SUN National Research Center for Intelligent Computing Systems Beijing 100190China Key Laboratory of Computer System and Architecture Chinese Academy of SciencesBeijing 100190China Graduate University of Chinese Academy of Sciences Beijing 100190China National Research Center for Intelligent Computing Systems Beijing 100190China Key Laboratory of Computer System and Architecture Chinese Academy of SciencesBeijing 100190China

An increasing number of supercomputers adopt a heterogeneous architecture, consisting of both general purpose CPUs and specialized accelerators. Such design is beneficial for scalability and power, but on the other hand, heterogeneity brings new challenges in communication systems to connect heterogeneous components and provide support for programming. The communication system of the Dawning 6000 connectstwo kinds of heterogeneous processors, Loongson and AMD, and adopts a three layer architecture with an intranode layer between heterogeneous components. To efficiently connect heterogeneous components, the system forms a global address space and provides a mechanism for message transmission via an in-node global store; and employing Infiniband network, provides an OS-bypassing virtualization method to share an Infiniband card between nodes. To facilitate programming on heterogeneous processors, it supports unified parallel C (UPC), with a modified complier based on global address space. Also, aspecial collective network is implemented for collective operations. Results obtained from a prototype system prove these features to be both feasible and efficient.

关键词： hyper parallel processing (HPP) global address space (GAS) virtualization Dawning 6000 unified parallel C (UPC)

来源：评论

学校读者我要写书评

暂无评论

Theoretical Treatment of Sink Scheduling Problem in Wireless Sensor Networks

Theoretical Treatment of Sink Scheduling Problem in Wireless...

引用

IEEE Conference on computer Communications Workshops

作者： Yu Gu Yusheng Ji Jie Li Baohua Zhao Information Systems Architecture Science Research Division National Institute of Informatics Department of Computer Science University of Tsukuba State Key Laboratory of Networking and Switching Technology

ISBN: (纸本)9781457702495

Sink Scheduling, in the form of scheduling multiple sinks among sink sites to leverage traffic burden, is an effective mechanism for the energy-efficiency of wireless sensor networks (WSNs). Due to the inherent difficulty (NP-hard in general), existing works on this topic mainly focus on heuristic/greedy algorithms and theoretic results remain unknown. In this paper, we fill in the research blank with two algorithms. The first one is based on the Column Generation (CG). It decomposes the original problem into two sub problems and solve them iteratively to approach the optimal solution. However, due to its high computational complexity, this algorithm is only suitable for small scale networks. The other one is a polynomial-time algorithm based on relaxation techniques to obtain an upperbound, which can serve as a performance benchmark for other algorithms on this problem. Through comprehensive simulations, we evaluate the efficiency of proposed algorithms.

关键词： Wireless sensor networks Performance metrics Sinks Relaxation Techniques Energy efficiency original problem algorithms

来源：评论

学校读者我要写书评

暂无评论

Building a personal high performance computer with heterogeneous processors

Building a personal high performance computer with heterogen...

引用

International Conference on Grid and Cloud Computing

作者： Li, Qiang Huo, Zhigang Sun, Ninghui National Research Center for Intelligent Computing Systems Beijing 100190 China Key Laboratory of Computer System and Architecture Chinese Academy of Sciences Beijing China Graduate University of Chinese Academy of Sciences Beijing 100190 China

ISBN: (纸本)9780769543130

Personal high performance computer (PHPC) requires lower cost and high performance. The Teraflops PHPC systems with special accelerator units like GPGPU have been presented, but they have difficulties in programming, compatibility and applicability. In this paper, we present HPP-PHPC, a hybrid architecture of heterogeneous processors connected by non-coherent off-chip system bus. The performance of HPP-PHPC is ensured by special processors integrated with vector units and high-efficiency interconnection between heterogeneous processors. And by the adoption of general processors and features like global physical address space and synchronization semantics in hardware, HPP-PHPC is more compatible and convenient for massage passing and PGAS programming model. Also it is more applicable to most applications, including those with many execution branches. Initial results obtained from our prototype system have proved our design. © 2010 IEEE.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

Virtualizing modern high-speed interconnection networks with performance and scalability

Virtualizing modern high-speed interconnection networks with...

引用

作者： Li, Bo Huo, Zhigang Zhang, Panyong Meng, Dan National Research Center for Intelligent Computing Systems Institute of Computing Technology Chinese Academy of Sciences Beijing China Key Laboratory of Computer System and Architecture Chinese Academy of Sciences Beijing China Graduate University of Chinese Academy of Sciences Beijing China

ISBN: (纸本)9780769542201

As one of the most important enabling technologies of cloud computing, virtualization brings to HPC good manageability, online system maintenance, performance isolation and fault isolation. Furthermore, previous study on VMM-bypass I/O that virtualizes OS-bypass networks (e.g. Infini- Band) relieved the worry of performance degradation coming along with virtualization. In this paper, we address the scalability challenges imposed upon OS-bypass networks under virtualized environments. The eXtended Reliable Connection (XRC) transport, proposed in modern high-speed interconnection networks to address the scalability problem in large scale applications, would not work in virtualized environments. To solve the problem, we propose VM-proof XRC design to eliminate the scalability gap between virtualized and native environments. Prototype evaluation shows that the virtualization of modern high-speed interconnection networks could get the same raw performance and scalability as in native non-virtualized environment with our VM-proof XRC design. The connection memory scalability shows a potential of 16 times improvement on virtualized clusters composed of 16-core nodes. © 2010 IEEE.

关键词： Scalability

来源：评论

学校读者我要写书评

暂无评论

Godson-T:An Efficient Many-Core architecture for Parallel Program Executions

引用

Journal of computer Science & Technology 2009年第6期24卷 1061-1073页

作者：范东睿袁楠张军超周永彬林伟宋风龙叶笑春黄河余磊龙国平张浩刘磊 Key Laboratory of Computer Systems and Architecture Institute of Computing Technology Chinese Academy of Sciences

Moore＇s law will grant computer architects ever more transistors for the foreseeable future, and the challenge is how to use them to deliver efficient performance and flexible programmability. We propose a many-core architecture, Godson- T, to attack this challenge. On the one hand, Godson-T features a region-based cache coherence protocol, asynchronous data transfer agents and hardware-supported synchronization mechanisms, to provide full potential for the high efficiency of the on-chip resource utilization. On the other hand, Godson-T features a highly efficient runtime system, a Pthreadslike programming model, and versatile parallel libraries, which make this many-core design flexibly programmable. This hardware/software cooperating design methodology bridges the high-end computing with mass programmers. Experimental evaluations are conducted on a cycle-accurate simulator of Godson-T. The results show that the proposed architecture has good scalability, fast synchronization, high computational efficiency, and flexible programmability.

关键词： many-core parallel computing multithread data communication thread synchronization runtime system

来源：评论

学校读者我要写书评

暂无评论

A Synchronization-Based Alternative to Directory Protocol

A Synchronization-Based Alternative to Directory Protocol

引用

International Symposium on Parallel and Distributed Processing with Applications, ISPA

作者： He Huang Lei Liu Nan Yuan Wei Lin Fenglong Song Junchao Zhang Dongrui Fan Institute of Computing Technology Key Laboratory of Computer Systems and Architecture Chinese Academy and Sciences Beijing China

The efficient support of cache coherence is extremely important to design and implement many-core processors. In this paper, we propose a synchronization-based coherence (SBC) protocol to efficiently support cache coherence for shared memory many-core architectures. The unique feature of our scheme is that it doesnpsilat use directory at all. Inspired by scope consistency memory model, our protocol maintains coherence at synchronization point. Within critical section, processor cores record write-sets (which lines have been written in critical section) with bloom-filter function. When the core releases the lock, the write-set is transferred to a synchronization manager. When another core acquires the same lock, it gets the write-set from the synchronization manager and invalidates stale data in its local cache. Experimental results show that the SBC outperforms by averages of 5% in execution time across a suite of scientific applications. At the mean time, the SBC is more cost-effective comparing to directory-based protocol that requires large amount of hardware resource and huge design verification effort.

关键词： Protocols Coherence Distributed processing Application software Helium Laboratories Concurrent computing Distributed computing computer architecture Memory architecture

来源：评论

学校读者我要写书评

暂无评论

Early Experiences with Write-Write Design of NFS over RDMA

Early Experiences with Write-Write Design of NFS over RDMA

引用

International Conference on Networking, architecture, and Storage (NAS)

作者： Bo Li Panyong Zhang Zhigang Huo Dan Meng National Research Center for Intelligent Computing Systems Institute of Computing Technology Key Laboratory of Computer System and Architecture Chinese Academy of Sciences China National Research Center for Intelligent Computing Systems Institute of Computing Technology Key Laboratory of Computer System and Architecture Chinese Academy and Sciences China

The network file system (NFS) protocol, as the de facto standard for sharing files in a distributed environment, has deployed Infiniband as the underlying transport of sunRPC, namely NFS over RDMA. In the current Read-Write design of NFS over RDMA, NFS write performance is limited for not fully utilizing the features of Infiniband. In this paper, we take on the challenge of enhancing the write performance of NFS. We propose and evaluate a new design of sunRPC over RDMA, namely Write-Write design. To guarantee the security of our design, we propose an HCA-based memory protection extension of Infiniband. Evaluations show that our Write-Write design increases the kernel-to-kernel RPC bandwidth by 15~27%. In real disk test, our Write-Write design gains 15%~22% in multi-client benchmarks compared with the Read-Write design.

关键词： Security Protection Bandwidth Scalability computer architecture computer networks Distributed computing File systems Benchmark testing Protocols

来源：评论

学校读者我要写书评

暂无评论

Parallelization of LU decomposition on the godson-Tv1 many-core architecture

引用

Jisuanji Xuebao/Chinese Journal of computers 2009年第11期32卷 2157-2167页

作者： Long, Guo-Ping Fan, Dong-Rui Key Laboratory of Computer Systems and Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing 100190 China

The many-core architecture is increasingly becoming a promising computing platform due to the advancement of semi-conductor technology. LU decomposition is a widely used kernel in both scientific and engineering computations. Although there are a lot of related works on traditional parallel architectures, there is still little work focusing on parallelizing it on many-core architectures. This paper investigates this problem from three aspects: load balancing, latency hiding and performance modeling. There are three contributions of this work: Firstly, a novel load balancing technique has been introduced to overcome the limitations of 2D scatter decomposition. Experimental results show that the proposed scheme achieves 20% performance improvement without optimization and 40% improvement after optimization. Secondly, an analytical performance model is presented. Quantitative experimental study shows that by carefully hiding memory latency through on chip memory hierarchy and for a selected block size, the upper bound of theoretical performance can be approximated by experiments. Experimental results also reveal two primary causes which make theoretical speedup hard to achieve: limited DRAM bandwidth and resource contention of on-chip network.

关键词： Parallel architectures

来源：评论

学校读者我要写书评

暂无评论

HPP-Controller: An intra-node controller designed for connecting heterogeneous CPUs

HPP-Controller: An intra-node controller designed for connec...

引用

IEEE International Conference on Cluster Computing

作者： Qiang Li Panyong Zhang Ninghui Sun Chinese Academy and Sciences China Chinese Academy of Sciences Beijing Beijing CN National Research Center of Intelligent Computing Systems Institute of Computing Technology Key Laboratory of Computer System and Architecture Chinese Academy and Sciences China

Heterogeneity is considered as a solution for supercomputers to scale to petascale. Many systems which are composed of general CPUs and special processing units such as Cells, GPGPUs and FPGAs have been implemented. In these systems, CPU needs interact with special processing units to process data together, thus communications between these heterogeneous processing units become a key problem, and the communication subsytem should provide low latency and high bandwidth. In this paper, we propose HPP-Controller, which is designed for connecting two different types of CPUs (AMD and Loongson) in one node. It connects heterogeneous CPUs on top of no-coherent HyperTransport (HT) fabric and supports Global Physical Address Space. We implement a FPGA-based prototype and evaluate it via experiments. Initial results show that HPP-Controller has low latency of 0.75 us and high bandwidth close to bandwith of HT links.

关键词： Joining processes Central Processing Unit Delay Bandwidth Supercomputers Petascale computing Control systems Field programmable gate arrays Access protocols Sun

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：