检索结果-内蒙古大学图书馆

The static parallel distribution algorithms for hybrid density-functional calculations in HONPAS package

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Qin, Xinming Shang, Honghui Xu, Lei Hu, Wei Yang, Jinlong Li, Shigang Zhang, Yunquan State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing China Hefei National Laboratory for Physical Sciences at Microscale Department of Chemical Physics Synergetic Innovation Center of Quantum Information and Quantum Physics University of Science and Technology of China Hefei Anhui230026 China

Hybrid density-functional calculation is one of the most commonly adopted electronic structure theory used in computational chemistry and materials science because of its balance between accuracy and computational cost. Recently, we have developed a novel scheme called NAO2GTO to achieve linear scaling (Order-N) calculations for hybrid density-functionalsShang et al. (2011). In our scheme, the most time-consuming step is the calculation of the electron repulsion integrals (ERIs) part. So how to create an even distribution of these ERIs in parallel implementation is an issue of particular importance. Here, we present two static scalable distributed algorithms for the ERIs computation. Firstly, the ERIs are distributed over ERIs shell pairs. Secondly, the ERIs is distributed over ERIs shell quartets. In both algorithms, the calculation of ERIs is independent of each other, so the communication time is minimized. We show our speedup results to demonstrate the performance of these static parallel distributed algorithms in the Hefei Order-N packages for ab initio simulations (HONPAS)Qin et al. (2014). Copyright © 2020, The Authors. All rights reserved.

关键词： Calculations

The dynamic parallel distribution algorithm for hybrid density-functional calculations in HONPAS package

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Shang, Honghui Xu, Lei Wu, Baodong Qin, Xinming Zhang, Yunquan Yang, Jinlong State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing China Hefei National Laboratory for Physical Sciences at Microscale Department of Chemical Physics Synergetic Innovation Center of Quantum Information and Quantum Physics University of Science and Technology of China Hefei Anhui230026 China

This work presents a dynamic parallel distribution scheme for the Hartree-Fock exchange (HFX) calculations based on the real-space NAO2GTO framework. The most time-consuming electron repulsion integrals (ERIs) calculation is perfectly load-balanced with 2-level master-worker dynamic parallel scheme, the density matrix and the HFX matrix are both stored in the sparse format, the network communication time is minimized via only communicating the index of the batched ERIs and the final sparse matrix form of the HFX matrix. The performance of this dynamic scalable distributed algorithm has been demonstrated by several examples of large scale hybrid density-functional calculations on Tianhe-2 supercomputers, including both molecular and solid states systems with multiple dimensions, and illustrates good scalability. Copyright © 2020, The Authors. All rights reserved.

关键词： Density functional theory

A knowledge acquisition automatizing framework from literature exemplified by Na + activation energy prediction of NASICON solid-state electrolyte

学校读者我要写书评

暂无评论

Energy Storage Materials 2025年 80卷

作者： Yue Liu Dahui Liu Zhengwei Yang Xianyuan Ge Wenxuan Yao Jie Wu Maxim Avdeev Siqi Shi State Key Laboratory of Materials for Advanced Nuclear Energy & School of Computer Engineering and Science Shanghai University Shanghai 200444 China Shanghai Engineering Research Center of Intelligent Computing System Shanghai 200444 China Australian Nuclear Science and Technology Organisation Sydney 2232 Australia School of Chemistry The University of Sydney Sydney 2006 Australia State Key Laboratory of Materials for Advanced Nuclear Energy & School of Materials Science and Engineering Shanghai University Shanghai 200444 China Materials Genome Institute Shanghai University Shanghai 200444 China

Materials science literature contains vast amount of structure-activity relationship knowledge crucial for materials discovery and design. However, automatic extraction of domain knowledge from literature remains challenging due to its unstructured and heterogeneous format. Herein, we propose a framework for automating knowledge acquisition, which involves a materials entity-aware relational extraction model (MatRE) to mine triples, an approach to construct a knowledge graph (KG) for the detection of associations among triples, as well as inference and representation of structure-activity relationships in a machine learning (ML)-compatible format. We demonstrate its application in predicting sodium ion activation energy for the NASICON solid-state electrolyte (SSE) system. MatRE trained on a NASICON SSE dataset, achieves an F1-score of 0.80, and is used to extract 260,475 entity–relation triples from 1,808 scientific publications. Furthermore, embedding 24 knowledge bullets from the KG into data pre-processing and feature engineering stages improves the performance and interpretability of six common ML models by up to 25.7%. This work offers key insights into automatic knowledge acquisition from literature and heralds a new paradigm for AI-assisted materials genome engineering driven by both data and knowledge.

关键词： Information extraction Knowledge graph Machine learning Materials science

Energy Saving Strategy of Power system Cluster Based on Container Virtualization

学校读者我要写书评

暂无评论

Energy Saving Strategy of Power System Cluster Based on Cont...

IEEE Asia Power and Energy Engineering Conference (APEEC)

作者： Ran Zheng Hao Wang Hai Jin Dechao Xu Yong Chen Xiaomeng Li Yufei Rao Zhenan Zhang National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology Wuhan China State Key Laboratory of Power Grid Safety and Energy Conservation China Electric Power Research Institute Haidian District Beijing China State Grid Henan Electric Power Company Electric Power Research Institute Zhengzhou China

ISBN: (数字)9781728167824

ISBN: (纸本)9781728167831

With the continuous development of power grids, the scale of supercomputing clusters has also gradually increased to carry a large number of power system simulation calculations, and the problem of high energy consumption has appeared. To solve this problem, we propose a container virtualization-based supercomputing cluster for power system. We analyze the impact of containers on power simulation calculations and compare the energy consumption effects of various container scheduling and migration algorithms on clusters. Experiments show that compared to virtual machines with hypervisor, which consumes massive resources and reduces performances by 28.4%, the performance degradation of container on power simulation calculation is 1.3%, which can be ignored. The energy consumption of load-concentration or resource-and-load-balance container scheduling algorithms is up to 4.0% lower and at least 2.2% lower than other algorithms. In container migration, the method combining autoregressive model with most-correlation and resource-andload-balance algorithms is better than other methods, which not only minimizes energy consumption, but also has lowest number of migrations and SLA violations. Experiments verify the feasibility and advantages of container migration in power system computing clusters.

关键词：

Fusion-Catalyzed Pruning for Optimizing Deep Learning on Intelligent Edge Devices

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Li, Guangli Ma, Xiu Wang, Xueying Liu, Lei Xue, Jingling Feng, Xiaobing State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing100190 China University of Chinese Academy of Sciences Beijing100190 China College of Computer Science and Technology Jilin University Changchun130012 China School of Computer Science and Engineering University of New South Wales SydneyNSW2052 Australia

The increasing computational cost of deep neural network models limits the applicability of intelligent applications on resource-constrained edge devices. While a number of neural network pruning methods have been proposed to compress the models, prevailing approaches focus only on parametric operators (e.g., convolution), which may miss optimization opportunities. In this paper, we present a novel fusion-catalyzed pruning approach, called FUPRUNER, which simultaneously optimizes the parametric and non-parametric operators for accelerating neural networks. We introduce an aggressive fusion method to equivalently transform a model, which extends the optimization space of pruning and enables non-parametric operators to be pruned in a similar manner as parametric operators, and a dynamic filter pruning method is applied to decrease the computational cost of models while retaining the accuracy requirement. Moreover, FUPRUNER provides configurable optimization options for controlling fusion and pruning, allowing much more flexible performance-accuracy trade-offs to be made. Evaluation with state-of-the-art residual neural networks on five representative intelligent edge platforms, Jetson TX2, Jetson Nano, Edge TPU, NCS, and NCS2, demonstrates the effectiveness of our approach, which can accelerate the inference of models on CIFAR-10 and ImageNet datasets. Copyright © 2020, The Authors. All rights reserved.

关键词： Economic and social effects

A Non-Stop Double Buffering Mechanism for Dataflow architecture

学校读者我要写书评

暂无评论

Journal of computer Science & technology 2018年第1期33卷 145-157页

作者： Xu Tan Xiao-Wei Shen Xiao-Chun Ye Da Wang Dong-Rui Fan Lunkai Zhang Wen-Ming Li Zhi-Min Zhang Zhi-Min Tang State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing 100190 China School of Computer and Control Engineering University of Chinese Academy of Sciences Beijing 100049 China State Key Laboratory of Mathematical Engineering and Advanced Computing Wuxi 214125 China Department of Computer Science The University of Chicago Chicago IL 60637 U.S.A.

Double buffering is an effective mechanism to hide the latency of data transfers between on-chip and off-chip memory. However, in dataflow architecture, the swapping of two buffers during the execution of many tiles decreases the performance because of repetitive filling and draining of the dataflow accelerator. In this work, we propose a non-stop double buffering mechanism for dataflow architecture. The proposed non-stop mechanism assigns tiles to the processing element array without stopping the execution of processing elements through optimizing control logic in dataflow architecture. Moreover, we propose a work-flow program to cooperate with the non-stop double buffering mechanism. After optimizations both on control logic and on work-flow program, the filling and draining of the array needs to be done only once across the execution of all tiles belonging to the same dataflow graph. Experimental results show that the proposed double buffering mechanism for dataftow architecture achieves a 16.2% average efficiency improvement over that without the optimization.

关键词： non-stop double buffering dataflow architecture high-performance computing

Optimizing Multi-Dimensional Packet Classification for Multi-Core systems

学校读者我要写书评

暂无评论

Journal of computer Science & technology 2018年第5期33卷 1056-1071页

作者： Tong Shen Da-Fang Zhang Gao-Gang Xie Xin-Yi Zhang College of Computer Science and Electronic Engineering Hunan University Changsha 410082 China Network Technology Research Center Institute of Computing Technology Chinese Academy of Sciences Beijing 100190 China State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing 100190 China University of Chinese Academy of Sciences Beijing 100049 China

Packet classification has been studied for decades; it classifies packets into specific flows based on a given rule set. As software-defined network was proposed, a recent trend of packet classification is to scale the five-tuple model to multi-tuple. In general, packet classification on multiple fields is a complex problem. Although most existing software-based algorithms have been proved extraordinary in practice, they are only suitable for the classic five-tuple model and difficult to be scaled up. Meanwhile, hardware-specific solutions are inflexible and expensive, and some of them are power consuming. In this paper, we propose a universal multi-dimensional packet classification approach for multi-core systems. In our approach, novel data structures and four decomposition-based algorithms are designed to optimize the classification and updating of rules. For multi-field rules, a rule set is cut into several parts according to the number of fields. Each part works independently. In this way, the fields are searched in parallel and all the partial results are merged together at last. To demonstrate the feasibility of our approach, we implement a prototype and evaluate its throughput and latency. Experimental results show that our approach achieves a 40% higher throughput than that of other decomposed-based algorithms and a 43% lower latency of rule incremental update than that of the other algorithms on average. Furthermore, our approach saves 39% memory consumption on average and has a good scalability.

关键词： multi-dimensional multi-core packet classification

Cooperative communication based connectivity recovery for UAV networks 19

学校读者我要写书评

暂无评论

Cooperative communication based connectivity recovery for UA...

2019 ACM Turing Celebration Conference - China, ACM TURC 2019

作者： Tian, Wen Jiao, Zhenzhen Liu, Min Zhang, Meng Li, Dong State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing100190 China University of Chinese Academy of Sciences Beijing100049 China RDA FOA ART-CN1 Corporate Technology Siemens Ltd. Beijing100102 China

ISBN: (纸本)9781450371582

UAV networks often partition into separated clusters due to the high node and link dynamic. As a result, network connectivity recovery is an important issue in this area. Existing solutions always need excessive movement of nodes and thus lead to low recovery efficiency in terms of the time and energy consumption. In this paper, we for the first time study the issue of how to utilize cooperative communication technology to improve the connectivity recovery efficiency in UAV networks. We propose a Cooperative Communication based Connectivity Recovery algorithm for UAV Networks, named C3RUN. The key novelty in C3RUN is nodes can proactively find better locations to establish more efficient cooperative communication links, than the ones from passively leveraging on existing opportunities. We conduct extensive simulations to evaluate the performance of C3RUN. The simulation results reveal that C3RUN can not only achieve connectivity recovery with less nodes and shorter distance to move, but also always finish recovery with less time, when comparing with existing work. Furthermore, C3RUN can achieve 100% success ratio for connectivity recovery. © 2019 Association for computing Machinery.

关键词： Cooperative communication

AIBench training: Balanced industry-standard AI training benchmarking

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Tang, Fei Gao, Wanling Zhan, Jianfeng Lan, Chuanxin Wen, Xu Wang, Lei Luo, Chunjie Cao, Zheng Xiong, Xingwang Jiang, Zihan Hao, Tianshu Fan, Fanda Zhang, Fan Huang, Yunyou Chen, Jianan Du, Mengjia Ren, Rui Zheng, Chen Zheng, Daoyi Tang, Haoning Zhan, Kunlin Wang, Biao Kong, Defei Yu, Minghe Tan, Chongkang Li, Huan Tian, Xinhui Li, Yatao Shao, Junchao Wang, Zhenyu Wang, Xiaoyu Dai, Jiahui Ye, Hainan State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences China University of Chinese Academy of Sciences China Alibaba Baidu Tencent *** NetEase ByteDance Zhihu China Lenovo Paypal Moqi Microsoft Research Asia *** CloudTa Intellifusion

Earlier-stage evaluations of a new AI architecture/system need affordable AI benchmarks. Only using a few AI component benchmarks like MLPerf alone in the other stages may lead to misleading conclusions. Moreover, the learning dynamics are not well understood, and the benchmarks' shelf-life is short. This paper proposes a balanced benchmarking methodology. We use real-world benchmarks to cover the factors space that impacts the learning dynamics to the most considerable extent. After performing an exhaustive survey on Internet service AI domains, we identify and implement nineteen representative AI tasks with state-of-the-art models. For repeatable performance ranking (RPR subset) and workload characterization (WC subset), we keep two subsets to a minimum for affordability. We contribute by far the most comprehensive AI training benchmark suite. The evaluations show: (1) AIBench Training (v1.1) outperforms MLPerf Training (v0.7) in terms of diversity and representativeness of model complexity, computational cost, convergent rate, computation, and memory access patterns, and hotspot functions;(2) Against the AIBench full benchmarks, its RPR subset shortens the benchmarking cost by 64%, while maintaining the primary workload characteristics;(3) The performance ranking shows the single-purpose AI accelerator like TPU with the optimized TensorFlow framework performs better than that of GPUs while losing the latter's general support for various AI models. The specification, source code, and performance numbers are available from the AIBench homepage https: //***/aibench-training/***. Copyright © 2020, The Authors. All rights reserved.

关键词： Benchmarking