检索结果-内蒙古大学图书馆

LazyGraph: Lazy Data Coherency for Replicas in Distributed Graph-Parallel Computation

学校读者我要写书评

暂无评论

ACM SIGPLAN Notices 2018年第1期53卷 276-289页

作者： Wang, Lei Zhuang, Liangji Chen, Junhang Cui, Huimin Lv, Fang Liu, Ying Feng, Xiaobing State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences China University of Chinese Academy of Sciences China

ISBN: (纸本)9781450349116

Replicas 1 of a vertex play an important role in existing distributed graph processing systems which make a single vertex to be parallel processed by multiple machines and access remote neighbors locally without any remote access. However, replicas of vertices introduce data coherency problem. Existing distributed graph systems treat replicas of a vertex v as an atomic and indivisible vertex, and use an eager data coherency approach to guarantee replicas atomicity. In eager data coherency approach, any changes to vertex data must be immediately communicated to all replicas of v, thus leading to frequent global synchronizations and communications. In this paper, we propose a lazy data coherency approach, called LazyAsync, which treats replicas of a vertex as independent vertices and maintains the data coherency by computations, rather than communications in existing eager approach. Our approach automatically selects some data coherency points from the graph algorithm, and maintains all replicas to share the same global view only at such points, which means the replicas are enabled to maintain different local views between any two adjacent data coherency points. Based on PowerGraph, we develop a distributed graph processing system LazyGraph to implement the LazyAsync approach and exploit graph-aware optimizations. On a 48-node EC2-like cluster, LazyGraph outperforms PowerGraph on four widely used graph algorithms across a variety of real-world graphs, with a speedup ranging from 1.25x to 10.69x. © 2018 ACM.

关键词： Graph theory

Data motifs: A lens towards fully understanding big data and ai workloads

学校读者我要写书评

暂无评论

arXiv 2018年

作者： Gao, Wanling Zhan, Jianfeng Wang, Lei Luo, Chunjie Zheng, Daoyi Tang, Fei Xie, Biwei Zheng, Chen Wen, Xu He, Xiwen Ye, Hainan Ren, Rui State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences University of Chinese Academy of Sciences State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences Institute of Computing Technology Chinese Academy of Sciences University of Chinese Academy of Sciences Beijing Academy of Frontier Sciences and Technology

The complexity and diversity of big data and AI workloads make understanding them difficult and challenging. This paper proposes a new approach to modelling and characterizing big data and AI workloads. We consider each big data and AI workload as a pipeline of one or more classes of units of computation performed on different initial or intermediate data inputs. Each class of unit of computation captures the common requirements while being reasonably divorced from individual implementations, and hence we call it a data motif. For the first time, among a wide variety of big data and AI workloads, we identify eight data motifs that take up most of the run time of those workloads, including Matrix, Sampling, Logic, Transform, Set, Graph, Sort and Statistic. We implement the eight data motifs on different software stacks as the micro benchmarks of an open-source big data and AI benchmark suite — BigDataBench 4.0 (publicly available from http://***/BigDataBench), and perform comprehensive characterization of those data motifs from perspective of data sizes, types, sources, and patterns as a lens towards fully understanding big data and AI workloads. We believe the eight data motifs are promising abstractions and tools for not only big data and AI benchmarking, but also domain-specific hardware and software co-design. Copyright © 2018, The Authors. All rights reserved.

关键词： Big data

Labeled yon Neumann architecture for Software-Defined Cloud

学校读者我要写书评

暂无评论

Journal of computer Science & technology 2017年第2期32卷 219-223页

作者： Yun-Gang Bao Sa Wang, Member, CCF State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing 100190 China University of Chinese Academy of Sciences Beijing 100049 China

As cloud computing is moving forward rapidly, cloud providers have been encountering great challenges： long tail latency, low utilization, and high interference. They intend to co-locate multiple workloads on a single server to improve the resource utilization. But the co-located applications suffer from severe performance interference and long tail latency, which lead to unpredictable user experience. To meet these challenges, software-defined cloud has been proposed to facilitate tighter coordination among application, operating system and hardware. Users＇ quality of service （QoS） requirements could be propagated all the way down to the hardware with differential management mechanisms. However, there is little hardware support to maintain and guarantee users＇ QoS requirements. To this end, this paper proposes Labeled von Neumann architecture （LvNA）, which introduces a labelling mechanism to convey more software＇s semantic information such as QoS and security to the underlying hardware. LvNA is able to correlate labels with various entities, e.g., virtual machine, process and thread, and propagate labels in the whole machine and program differentiated services based on rules. We consider LvNA to be a fundamental hardware support to the software-defined cloud.

关键词： software-defined cloud von Neumann architecture tail latency performance interference

Semi-supervised classification method of SAR images using spectral clustering in contourlet domain

学校读者我要写书评

暂无评论

Journal of Physics: Conference Series 2020年第4期1486卷

作者： Kaiwen Jiang Degan Zhang Haixia Xu Key Laboratory of Computer Vision and System (Tianjin University of Technology) Ministry of Education 300384 China Tianjin Key Lab of Intelligent Computing & Novel software Technology Tianjin University of Technology Tianjin China School of Electronic and Information Engineering Tianjin Vocational Institute Tianjin 300410 China

A new based on Semi-supervised classification theory for SAR images in contourlet domain is proposed, in this paper. Attempting to get better and faster performance, the PSO algorithm (Particle swarm optimization algorithm) and contourlet domain is proposed to instead of traditional k-means algorithm. PSO is used to find the global optimum by performing a global search in the whole solution space. And then, contourlet is applied in front of construct the similarity matrix to extract more effective eigenvalues. In section five, the proposed algorithm got better classification results than the traditional k-means algorithm which is proved by experimental results show that in terms of running time, classification accuracy and Kappa coefficient.

关键词：

Tetris: Re-architecting convolutional neural network computation for machine learning accelerators

学校读者我要写书评

暂无评论

arXiv 2018年

作者： Lu, Hang Wei, Xin Lin, Ning Yan, Guihai Li, Xiaowei State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences University of Chinese Academy of Sciences

Inference efficiency is the predominant consideration in designing deep learning accelerators. Previous work mainly focuses on skipping zero values to deal with remarkable ineffectual computation, while zero bits in non-zero values, as another major source of ineffectual computation, is often ignored. The reason lies on the difficulty of extracting essential bits during operating multiply-and-accumulate (MAC) in the processing element. Based on the fact that zero bits occupy as high as 68.9% fraction in the overall weights of modern deep convolutional neural network models, this paper firstly proposes a weight kneading technique that could eliminate ineffectual computation caused by either zero value weights or zero bits in non-zero weights, simultaneously. Besides, a split-and-accumulate (SAC) computing pattern in replacement of conventional MAC, as well as the corresponding hardware accelerator design called Tetris are proposed to support weight kneading at the hardware level. Experimental results prove that Tetris could speed up inference up to 1.50x, and improve power efficiency up to 5.33x compared with the state-of-the-art baselines. Copyright © 2018, The Authors. All rights reserved.

关键词： Deep neural networks

Efficient graph computation for Node2Vec

学校读者我要写书评

暂无评论

arXiv 2018年

作者： Zhou, Dongyan Niu, Songjie Chen, Shimin State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences University of Chinese Academy of Sciences

Node2Vec is a state-of-the-art general-purpose feature learning method for network analysis. However, current solutions cannot run Node2Vec on large-scale graphs with billions of vertices and edges, which are common in real-world applications. The existing distributed Node2Vec on Spark incurs significant space and time overhead. It runs out of memory even for mid-sized graphs with millions of vertices. Moreover, it considers at most 30 edges for every vertex in generating random walks, causing poor result quality. In this paper, we propose Fast-Node2Vec, a family of efficient Node2Vec random walk algorithms on a Pregel-like graph computation framework. Fast-Node2Vec computes transition probabilities during random walks to reduce memory space consumption and computation overhead for large-scale graphs. The Pregel-like scheme avoids space and time overhead of Spark’s read-only RDD structures and shuffle operations. Moreover, we propose a number of optimization techniques to further reduce the computation overhead for popular vertices with large degrees. Empirical evaluation show that Fast-Node2Vec is capable of computing Node2Vec on graphs with billions of vertices and edges on a mid-sized machine cluster. Compared to Spark-Node2Vec, Fast-Node2Vec achieves 7.7–122x speedups. Copyright © 2018, The Authors. All rights reserved.

关键词： Graphic methods

See and think: disentangling semantic scene completion 18

学校读者我要写书评

暂无评论

See and think: disentangling semantic scene completion

Proceedings of the 32nd International Conference on Neural Information Processing systems

作者： Shice Liu Yu Hu Yiming Zeng Qiankun Tang Beibei Jin Yinhe Han Xiaowei Li State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences and University of Chinese Academy of Sciences

Semantic scene completion predicts volumetric occupancy and object category of a 3D scene, which helps intelligent agents to understand and interact with the surroundings. In this work, we propose a disentangled framework, sequentially carrying out 2D semantic segmentation, 2D-3D reprojection and 3D semantic scene completion. This three-stage framework has three advantages: (1) explicit semantic segmentation significantly boosts performance; (2) flexible fusion ways of sensor data bring good extensibility; (3) progress in any subtask will promote the holistic performance. Experimental results show that regardless of inputing a single depth or RGB-D, our framework can generate high-quality semantic scene completion, and outperforms state-of-the-art approaches on both synthetic and real datasets.

关键词：

NTIRE 2020 Challenge on Image and Video Deblurring

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Seungjun, Nah Sanghyun, Son Radu, Timofte Kyoung Mu, Lee Tseng, Yu Xu, Yu-Syuan Chiang, Cheng-Ming Tsai, Yi-Min Brehm, Stephan Scherer, Sebastian Xu, Dejia Chu, Yihao Sun, Qingyan Jiang, Jiaqin Duan, Lunhao Yao, Jian Purohit, Kuldeep Suin, Maitreya Rajagopalan, A.N. Ito, Yuichi Hrishikesh, P.S. Puthussery, Densen Akhil, K.A. Jiji, C.V. Kim, Guisik Deepa, P.L. Xiong, Zhiwei Huang, Jie Liu, Dong Kim, Sangmin Nam, Hyungjoon Kim, Jisu Jeong, Jechang Huang, Shihua Fan, Yuchen Yu, Jiahui Yu, Haichao Huang, Thomas S. Zhou, Ya Li, Xin Liu, Sen Chen, Zhibo Dutta, Saikat Das, Sourya Dipta Garg, Shivam Sprague, Daniel Patel, Bhrij Huck, Thomas Department of ECE ASRI SNU Korea Republic of Computer Vision Lab ETH Zurich Switzerland MediaTek Inc University of Augsburg Chair for Multimedia Computing and Computer Vision Lab Germany Peking University China Beijing University of Posts and Telecommunications China Beijing Jiaotong University China Wuhan University China Indian Institute of Technology Madras India Vermilion College of Engineering Trivandrum India CVML Chung-Ang University Korea Republic of APJ Abdul Kalam Technological University India University of Science and Technology of China China Image Communication Signal Processing Laboratory Hanyang University Korea Republic of Southern University of Science and Technology China University of Illinois at Urbana-Champaign United States CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China China IIT Madra Jadavpur University India University of Texas Austin United States Duke University Computer Science Department United States

Motion blur is one of the most common degradation artifacts in dynamic scene photography. This paper reviews the NTIRE 2020 Challenge on Image and Video Deblurring. In this challenge, we present the evaluation results from 3 competition tracks as well as the proposed solutions. Track 1 aims to develop single-image deblurring methods focusing on restoration quality. On Track 2, the image deblurring methods are executed on a mobile platform to find the balance of the running speed and the restoration accuracy. Track 3 targets developing video deblurring methods that exploit the temporal relation between input frames. In each competition, there were 163, 135, and 102 registered participants and in the final testing phase, 9, 4, and 7 teams competed. The winning methods demonstrate the state-of-the-art performance on image and video deblurring tasks. Copyright © 2020, The Authors. All rights reserved.

关键词： Image enhancement

BENCHIP： Benchmarking Intelligence Processors

学校读者我要写书评

暂无评论

Journal of computer Science & technology 2018年第1期33卷 1-23页

作者： Jin-Hua Tao Zi-Dong Du Qi Guo Hui-Ying Lan Lei Zhang Sheng-Yuan Zhou Ling-Jie Xu Cong Liu Hai-Feng Liu Shah Tang Allen Rush Willian Chen Shao-Li Liu Yun-Ji Chen Tian-Shi Chen State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing 100190 China School of Computer and Control Engineering University of Chinese Academy of Sciences Beijing 100049 China Intelligent Processor Research Center Institute of Computing Technology Chinese Academy of Sciences Beijing 100190 China Cambricon Ltd. Beijing 100190 China A libaba Infrastructure Service A libaba Group Hangzhou 311121 China Iflytek Co. Ltd. Hefei 230088 China Beijing Jingdong Century Trading Co. Ltd. Beijing 100176 China RDA Microdectronics Inc. Shanghai 201203 China Advanced Micro Devices Inc. Sunnyvale CA 94085 U.S.A.

The increasing attention on deep learning has tremendously spurred the design of intelligence processing hardware. The variety of emerging intelligence processors requires standard benchmarks for fair comparison and system optimization （in both software and hardware）. However, existing benchmarks are unsuitable for benchmarking intelligence processors due to their non-diversity and nonrepresentativeness. Also, the lack of a standard benchmarking methodology further exacerbates this problem. In this paper, we propose BENCHIP, a benchmark suite and benchmarking methodology for intelligence processors. The benchmark suite in BENCHIP consists of two sets of benchmarks： microbenchmarks and macrobenchmarks. The microbenchmarks consist of single-layer networks, They are mainly designed for bottleneck analysis and system optimization. The macrobenchmarks contain state-of-the-art industrial networks, so as to offer a realistic comparison of different platforms. We also propose a standard benchmarking methodology built upon an industrial software stack and evaluation metrics that comprehensively reflect various characteristics of the evaluated intelligence processors, BENCHIP is utilized for evaluating various hardware platforms, including CPUs, GPUs, and accelerators. BENCHIP will be open-sourced soon.

关键词： deep learning intelligence processor benchmark