检索结果-内蒙古大学图书馆

The TH Express high performance interconnect networks

Frontiers of Computer science 2014年第3期8卷 357-366页

作者： Zhengbin PANG Min XIE Jun ZHANG Yi ZHENG Guibin WANG Dezun DONG Guang SUO Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China College of Computer National University of Defense Technology Changsha 410073 China

Interconnection network plays an important role in scalable high performance computer （HPC） systems. The TH Express-2 interconnect has been used in MilkyWay-2 system to provide high-bandwidth and low-latency interprocessot communications, and continuous efforts are devoted to the development of our proprietary interconnect. This paper describes the state-of-the-art of our proprietary interconnect, especially emphasizing on the design of network interface. Several key features are introduced, such as user-level communication, remote direct memory access, offload collective operation, and hardware reliable end-to-end communication, etc. The design of a low level message passing infrastructures and an upper message passing services are also proposed. The preliminary performance results demonstrate the efficiency of the TH interconnect interface.

关键词： HPC network interface chip （NIC） TH Express nterconnect offload collective operation

来源：评论

学校读者我要写书评

暂无评论

Memory Access Analysis of Many-core System with Abundant Bandwidth

Memory Access Analysis of Many-core System with Abundant Ban...

引用

IEEE International Symposium on Embedded Multicore Socs (MCSoC)

作者： Chuan Tang Dan Liu Zuocheng Xing Peng Yang Zhe Wang Jiang Xu Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China Department of Electronic and Computer Engineering Hong Kong University of Science and Technology Hong Kong China

ISBN: (纸本)9781479986712

Many-core system is main architecture trend currently. One of the dominating challenges for on-chip manycore system is the memory wall. However traditional research primarily focus on the limited bandwidth. To solve this problem, many-core system is aided with large cache, and a lot of complex approaches about memory and cache are adopted aiming at relaxing the pressure of bandwidth and improving the efficiency of cache. All these methods generate much cost of area and power. In this paper, we are motivated by the feature of abundant bandwidth and low latency of optical interconnect. We analyze the characteristics of memory access on 64 cores system under the case of high bandwidth which can be assumed to benefit from optical interconnect, considering the sensibility with bandwidth and cache for different benchmarks. Finally, we discuss about promising basic frameworks suitable for manycore system with optical interconnect.

关键词： Bandwidth Benchmark testing Optical interconnections Random access memory Delays Hidden Markov models Computers

来源：评论

学校读者我要写书评

暂无评论

HybridSwap: A scalable and synthetic framework for guest swapping on virtualization platform

HybridSwap: A scalable and synthetic framework for guest swa...

引用

IEEE Annual Joint Conference: INFOCOM, IEEE Computer and Communications Societies

作者： Pengfei Zhang Xi Li Rui Chu Huaimin Wang National Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha China School of Information Science and Engineering Central South University Changsha China

In IaaS cloud environments, peak memory demand caused by hotspot applications in Virtual Machine (VM) often results in performance degradation within and outside of this VM. Some solutions such as host swapping and ballooning for memory consolidation and overcommitment have been proposed. These solutions, however, have no help for addressing guest swapping issues inside VM. Even though host holds sufficient memory pages, guest OS is unable to utilize free pages in host directly due to the semantic gap between VMM and it. Our goal is to alleviate the performance degradation by decreasing disk I/O operations generated by guest swapping. Based on the insight analysis of behavioral features of guest swapping, we design HybridSwap, a distributed scalable framework which organize surplus memory in all hosts within data center into virtual pools for swapping. This framework builds up a synthetic swapping mechanism in a peer-to-peer way, which VM can adaptively choose suitable pools for swapping. We implement the prototype of HybridSwap and evaluate it with different benchmarks. The results demonstrate that our solution has the ability to promote the guest swapping efficiency indeed. Even in some cases, it shows 2-5 times of performance promotion compared with the baseline setup.

关键词： Benchmark testing Virtualization Semantics Servers Operating systems Degradation Instruction sets

来源：评论

学校读者我要写书评

暂无评论

Virtual Frame Aggregation: Clustered Channel Access in Wireless Networks

Virtual Frame Aggregation: Clustered Channel Access in Wirel...

引用

IEEE International Conference on Communications

作者： Xuan Dong Shaohe Lv Chunsheng Zhu Rukhsana Ruby Xiaodong Wang Xingming Zhou Victor C. M. Leung National Laboratory of Parallel and Distributed Processing National University of Defense Technology Department of Electrical and Computer Engineering The University of British Columbia

ISBN: (纸本)9781467364300

Coordination among users is inevitable in wireless communication for efficient medium access. Even though the data rate of individual user increases significantly, the performance of wireless network does not grow up accordingly due to the high MAC coordination overhead. In this paper, we present VFA, namely virtual frame aggregation, to achieve high coordination efficiency by amortizing the overhead over multiple transmissions. VFA provides a novel way to construct a winner cluster and allow the winners to transmit without interruption. Specifically, in a multicarrier network, every contending node chooses a subcarrier and the nodes are ordered by the index of the chosen subcarrier. When there are some subcarriers chosen by two or more nodes, an additional slot is exploited to reorder the collided nodes. Finally, all ordered nodes form a cluster and the transmissions are issued sequentially and uninterruptedly. Simulation results show that usually two slots are enough to construct a sufficiently large winner cluster. Moreover, VFA achieves a notable throughput gain over IEEE 802.11 as high as 120% with better fairness under various scenarios.

关键词： Wireless communication Antennas Wires

来源：评论

学校读者我要写书评

暂无评论

Design, Implementation and Evaluation of an Application-Layer Virtualized Network

Design, Implementation and Evaluation of an Application-Laye...

引用

2015 IEEE Symposium on Service-Oriented System Engineering

作者： Yiming Zhang Dongsheng Li Yijie Wang Zhigang Sun Feng Zhao Jinshu Su National University of Defense Technology Changsha Hunan CN School of Computer National Laboratory for Parallel and Distributed Processing (PDL) Changsha China

The performance of virtualized networks is critical to cloud applications. The "distributed line graphs" (DLG) are a universal technique for designing network topologies based on arbitrary regular graphs. In this paper we implement a prototype (C library) for a DLG-enabled network (called DLG-Kautz), as an application-layer virtualized network service. The effectiveness of our design and implementation is demonstrated through prototype evaluations.

关键词： Routing Message systems Ports (Computers) Servers Payloads Prototypes Libraries

来源：评论

学校读者我要写书评

暂无评论

MilkyWay-2： back to the world Top 1

引用

Frontiers of Computer science 2014年第3期8卷 343-344页

作者： Xiangke LIAO Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China College of Computer National University of Defense Technology Changsha 410073 China

On the 41st Top500 list announced in June 2013, the MilkyWay-2 system produced by National University of Defense technology （NUDT） in China won the first place with a LINPACK test result of 33.86 PFLOPS. It has been one and a half year since its predecessor, MilkyWay-1 （TH-1）, reached the same place for the first time. On the newest Top500 list published in November 2013, MilkyWay-2 continued to win the champion.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Robust weighted supervised sparse coding for image classification

Robust weighted supervised sparse coding for image classific...

引用

International Conference on Communication technology (ICCT)

作者： Xiang Zhang Naiyang Guan Zhigang Luo College of Computer National University of Defense Technology Changsha China National Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha China

ISBN: (纸本)9781467370066

Sparse coding has shown its great potential in learning image feature representation. Recent developed methods such as group sparse coding prefer discovering the group relationships among examples and have achieved the state-of-the-art results in image classification. However, they suffer from poor robustness shortcomings in practice. This paper proposes a robust weighted supervised sparse coding method (RWSSC) to address this deficiency. Particularly, RWSSC distinguishes different classes' contributions to the sparse coding by a novel weighting strategy meanwhile removes the out liers by imposing l1-regularization over the noisy entries. Benefitting from these strategies, RWSSC can effectively boost performance of sparse coding in image classification. Besides, we developed the block coordinate descent algorithm to optimize it, and proved its convergence. Experimental results of image classification on two popular datasets show that RWSSC outperforms the representative sparse coding methods in quantities.

关键词： Support vector machines Robustness Classification algorithms Dictionaries Manganese

来源：评论

学校读者我要写书评

暂无评论

Accelerating FDTD simulation of microwave pulse coupling into narrow slots on the Intel MIC architecture

Accelerating FDTD simulation of microwave pulse coupling int...

引用

IEEE Pacific Rim Conference on Communications, Computers and Signal processing

作者： Qinglin Wang Jie Liu Xiantao Cui Guitao Fu Chunye Gong Zuocheng Xing School of Computer Science National University of Defense Technology Changsha China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China Beijing Satellite Navigation Center Beijing China

ISBN: (纸本)9781467377898

The coupling of microwaves into apertures plays an important part in many electromagnetic physics and engineering fields. When the width of apertures is very small, Finite Difference Time Domain (FDTD) simulation of the coupling is very time-consuming. As a many-core architecture, the Intel's Many Integrated Core (MIC) architecture owns 512-bit vector units and more than 200 threads. In this paper, we parallelize FDTD simulation of microwave pulse coupling into narrow slots on the Intel MIC architecture. In the implementation, the parallel programming model OpenMP is used to exploit thread parallelism while loop unrolling and SIMD intrinsic functions are utilized to accomplish vectorization. Compared with the serial version on Intel Xeon E5-2670 CPU, the implementation on the MIC coprocessor including 57 cores obtains a speedup of 11.57 times. The experiment results also demonstrate that the parallelization has good scalability in performance. Additionally, how binding relationship between OpenMP threads and hardware threads in MIC influences performance is also reported.

关键词： Microwave integrated circuits Finite difference methods Time-domain analysis Chlorine Hardware

来源：评论

学校读者我要写书评

暂无评论

An Online/Offline HIBS Scheme for Privacy Protection of People-centric Sensing

An Online/Offline HIBS Scheme for Privacy Protection of Peop...

引用

International Conference on Industrial Informatics, Computing technology, Intelligent technology, Industrial Information Integration (ICIICII)

作者： Peixin Chen Xiaofeng Wang Jinshu Su College of Computer National University of Defense Technology Changsha China National Key Laboratory for Parallel Distributed Processing National University of Defense Technology Changsha China

ISBN: (纸本)9781467383134

People-Centric Sensing (PCS), which collects information closely related to human activity and interactions in societies, is stepping into a flourishing time. Along with its great benefits, PCS poses new security challenges such as data integrity, participant privacy. Hierarchical Identity-Based Signature (HIBS) scheme can efficiently provide high integrity messaging, secure communication and privacy protection to PCS. However, the low computation efficiency primarily hinders the PCS adoption of HIBS scheme. In this paper, we propose an online/offline HIBS (HIBOOS) scheme for securing PCS. By splitting the signing phase into online and offline procedures, our scheme achieves high signing efficiency.

关键词： Games Sensors Privacy Public key Generators Wireless sensor networks

来源：评论

学校读者我要写书评

暂无评论

Improving vertex-frontier based GPU breadth-first search

引用

Journal of Central South University 2014年第10期21卷 3828-3836页

作者：杨博卢凯高颖慧徐凯王小平程志权 Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology College of Computer National University of Defense Technology Department of Electronic Science and Engineering National University of Defense Technology Avatar Science Company

Breadth-first search(BFS) is an important kernel for graph traversal and has been used by many graph processing applications. Extensive studies have been devoted in boosting the performance of BFS. As the most effective solution, GPU-acceleration achieves the state-of-the-art result of 3.3×109 traversed edges per second on a NVIDIA Tesla C2050 GPU. A novel vertex frontier based GPU BFS algorithm is proposed, and its main features are three-fold. Firstly, to obtain a better workload balance for irregular graphs, a virtual-queue task decomposition and mapping strategy is introduced for vertex frontier expanding. Secondly, a global deduplicate detection scheme is proposed to remove reduplicative vertices from vertex frontier effectively. Finally, a GPU-based bottom-up BFS approach is employed to process large frontier. The experimental results demonstrate that the algorithm can achieve 10% improvement over the state-of-the-art method on diverse graphs. Especially, it exhibits 2-3 times speedup on low-diameter and scale-free graphs over the state-of-the-art on a NVIDIA Tesla K20 c GPU, reaching a peak traversal rate of 11.2×109 edges/s.

关键词： breadth-first search GPU graph traversal vertex frontier

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：