检索结果-内蒙古大学图书馆

International Conference on parallel processing (ICPP)

作者： Feng Wang Hao Jiang Ke Zuo Xing Su Jingling Xue Canqun Yang School of Computer Science National University of Defense Technology Changsha China School of Computer Science and Engineering University of New South Wales NSW Australia Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China

This paper presents the design and implementation of a highly efficient Double-precision General Matrix Multiplication (DGEMM) based on Open BLAS for 64-bit ARMv8 eight-core processors. We adopt a theory-guided approach by first developing a performance model for this architecture and then using it to guide our exploration. The key enabler for a highly efficient DGEMM is a highly-optimized inner kernel GEBP developed in assembly language. We have obtained GEBP by (1) maximizing its compute-to-memory access ratios across all levels of the memory hierarchy in the ARMv8 architecture with its performance-critical block sizes being determined analytically, and (2) optimizing its computations through exploiting loop unrolling, instruction scheduling and software-implemented register rotation and taking advantage of A64 instructions to support efficient FMA operations, data transfers and prefetching. We have compared our DGEMM implemented in Open BLAS with another implemented in ATLAS (also in terms of a highly-optimized GEBP in assembly). Our implementation outperforms the one in ALTAS by improving the peak performance (efficiency) of DGEMM from 3.88 Gflops (80.9%) to 4.19 Gflops (87.2%) on one core and from 30.4 Gflops (79.2%) to 32.7 Gflops (85.3%) on eight cores. These results translate into substantial performance (efficiency) improvements by 7.79% on one core and 7.70% on eight cores. In addition, the efficiency of our implementation on one core is very close to the theoretical upper bound 91.5% obtained from micro-benchmarking. Our parallel implementation achieves good performance and scalability under varying thread counts across a range of matrix sizes evaluated.

关键词： Registers Kernel Computational modeling Program processors Assembly Memory management

来源：评论

学校读者我要写书评

暂无评论

Performance Optimization of a CFD Application on Intel Multicore and Manycore Architectures

Performance Optimization of a CFD Application on Intel Multi...

引用

10th Annual Conference of Advanced Computer Architecture, ACA 2014

作者： Che, Yonggang Zhang, Lilun Wang, Yongxian Xu, Chuanfu Liu, Wei Cheng, Xinghua Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China

ISBN: (纸本)9783662444900

This paper reports our experience optimizing the performance of a high-order and high accurate Computational Fluid Dynamics (CFD) application (HOSTA) on the state of art multicore processor and the emerging Intel Many Integrated Core (MIC) coprocessor. We focus on effective loop vectorization and memory access optimization. A series techniques, including data structure transformations, procedure inlining, compiler SIMDization, OpenMP loop collapsing, and the use of Huge Pages, are explored. Detailed execution time and event counts from Performance Monitoring Units are measured. The results show that our optimizations have improved the performance of HOSTA by 1.61× on a two Intel Sandy Bridge processors based computer node and 1.97× on a Intel Knights Corner coprocessor, the public MIC product. The microarchitecture level effects of these optimizations are also discussed. © Springer-Verlag Berlin Heidelberg 2014.

关键词： Computational fluid dynamics

来源：评论

学校读者我要写书评

暂无评论

Maximizing the information diffusion opportunity in the cyber-physical network

Maximizing the information diffusion opportunity in the cybe...

引用

International Conference on Quality of Service in Heterogeneous Wired/Wireless Networks (QSHINE)

作者： Hongliang Lu Xuan Dong Wenxiang Li Saohe Lv Xiaodong Wang Wei Chen Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Hunan China School of Information Science and Engineering Wuhan University of Science and Technology Wuhan China Engineering Research Center for Metallurgical Automation and Detecting Technology of Ministry of Education Wuhan University of Science and Technology Wuhan China

ISBN: (纸本)9781479982172

Our daily life is changing by the smart objects, such as smart watches, smart phones etc. They make the cyber world and the physical world integrated by their abundant abilities of sensing, communication and computation etc. Focusing on a wide range of the integrated network, a statistical based strategy was introduced to get a special kind of link between objects, the statistical probability communication link. To get a maximized information spread probability for grouped people, this paper introduced a distributed, yet efficient algorithm naming DMPID algorithm, for finding a sub-network to spread people oriented inforamtion. The DMPID algorithm take the size of the selection and the information spread probability into account, and made a balance between the two parameters. Extended simulation showed that the DMPID algorithm performs well in different distributed networks.

关键词： distributed algorithms probability statistical analysis telecommunication networks DMPID algorithm cyber world cyber-physical network distributed maximizing probability of information diffusion information diffusion opportunity information spread probability statistical probability communication link

来源：评论

学校读者我要写书评

暂无评论

DREAMS: Dynamic resource allocation for MapReduce with data skew

DREAMS: Dynamic resource allocation for MapReduce with data ...

引用

IFIP/IEEE International Symposium on Integrated Network Management

作者： Zhihong Liu Qi Zhang Mohamed Faten Zhani Raouf Boutaba Yaping Liu Zhenghu Gong College of Computer National University of Defense Technology Changsha China David R. Cheriton School of Computer Science University of Waterloo Waterloo ON Canada Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha Hunan China

ISBN: (纸本)9781479982424

MapReduce has become a popular model for large-scale data processing in recent years. However, existing MapRe-duce schedulers still suffer from an issue known as partitioning skew, where the output of map tasks is unevenly distributed among reduce tasks. In this paper, we present DREAMS, a framework that provides run-time partitioning skew mitigation. Unlike previous approaches that try to balance the workload of reducers by repartitioning the intermediate data assigned to each reduce task, in DREAMS we cope with partitioning skew by adjusting task run-time resource allocation. We show that our approach allows DREAMS to eliminate the overhead of data repartitioning. Through experiments using both real and synthetic workloads running on a 11-node virtual virtualised Hadoop cluster, we show that DREAMS can effectively mitigate negative impact of partitioning skew, thereby improving job performance by up to 20.3%.

关键词： Resource management Containers Predictive models Mathematical model Monitoring Biomedical monitoring Yarn

来源：评论

学校读者我要写书评

暂无评论

High-energy-density electron beam from interaction of two successive laser pulses with subcritical-density plasma

引用

Physical Review Accelerators and Beams 2016年第2期19卷 021301-021301页

作者： J. W. Wang W. Yu M. Y. Yu H. Xu J. J. Ju S. X. Luan M. Murakami M. Zepf S. Rykovanov Helmholtz Institute Jena Jena 07743 Germany State Key Laboratory of High Field Laser Physics Shanghai Institute of Optics and Fine Mechanics Chinese Academy of Sciences Shanghai 201800 China Institute for Fusion Theory and Simulation and the Department of Physics Zhejiang University Hangzhou 310027 China Institute for Theoretical Physics I Ruhr University Bochum D-44780 Germany National Laboratory for Parallel and Distributed Processing School of Computer Science National University of Defense Technology Changsha 410073 China Institute of Laser Engineering Osaka University Osaka 565-0871 Japan Centre for Plasma Physics School of Mathematics and Physics Queen’s University Belfast Belfast BT7 1NN United Kingdom

It is shown by particle-in-cell simulations that a narrow electron beam with high energy and charge density can be generated in a subcritical-density plasma by two consecutive laser pulses. Although the first laser pulse dissipates rapidly, the second pulse can propagate for a long distance in the thin wake channel created by the first pulse and can further accelerate the preaccelerated electrons therein. Given that the second pulse also self-focuses, the resulting electron beam has a narrow waist and high charge and energy densities. Such beams are useful for enhancing the target-back space-charge field in target normal sheath acceleration of ions and bremsstrahlung sources, among others.

关键词： Plasma acceleration & new acceleration techniques

来源：评论

学校读者我要写书评

暂无评论

Iaso： an autonomous fault-tolerant management system for supercomputers

引用

Frontiers of Computer science 2014年第3期8卷 378-390页

作者： Kai LU Xiaoping WANG Gen LI Ruibo WANG Wanqing CHI Yongpeng LIU Hongwei TANG Hua FENG Yinghui GAO Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China College of Computer National University of Defense Technology Changsha 410073 China ATR Laboratory National University of Defense Technology Changsha 410073 China

With the increase of system scale, the inherent reliability of supercomputers becomes lower and lower. The cost of fault handling and task recovery increases so rapidly that the reliability issue will soon harm the usability of supercomputers. This issue is referred to as the ＂reliability wall＂, which is regarded as a critical problem for current and future supercomputers. To address this problem, we propose an autonomous fault-tolerant system, named Iaso, in MilkyWay- 2 system. Iaso introduces the concept of autonomous management in supercomputers. By autonomous management, the computer itself, rather than manpower, takes charge of the fault management work. Iaso automatically manage the whole lifecycle of faults, including fault detection, fault diagnosis, fault isolation, and task recovery. Iaso endows the autonomous features with MilkyWay-2 system, such as self-awareness, self-diagnosis, self-healing, and self-protection. With the help of Iaso, the cost of fault handling in supercomputers reduces from several hours to a few seconds. Iaso greatly improves the usability and reliability of MilkyWay-2 system.

关键词： supercomputer autonomous management fault tolerant fault management MilkyWay-2 system

来源：评论

学校读者我要写书评

暂无评论

RTC: Link schedule based MAC design in multi-hop wireless network

RTC: Link schedule based MAC design in multi-hop wireless ne...

引用

International Conference on Quality of Service in Heterogeneous Wired/Wireless Networks (QSHINE)

作者： Xuan Dong Yinjia Huo Chunsheng Zhu Shaohe Lv Wenxiang Li Xiaodong Wang Department of Electrical and Computer Engineering The University of British Columbia Vancouver BC Canada National Laboratory of Parallel and Distributed Processing National University of Defense Technology Changsha China Engineering Research Center for Metallurgical Automation and Detecting Technology of Ministry of Education Wuhan University of Science and Technology Wuhan China

ISBN: (纸本)9781479982172

The performance of an ad-hoc network is greatly limited by collisions due to hidden terminals. In this paper, we propose a receiver tracking contention (RTC) scheme, which achieves high throughput by allowing the receivers to assist for channel contention. In RTC, link is the basic unit for channel access contention. Specifically, transmitter is used to contend for the channel and receiver is used to announce the potential collision. Based on INT message coding scheme, transmitter and its corresponding receiver can be well coordinated. In such mechanism, hidden terminals are avoided and exposed terminals are encouraged to transmit simultaneously. Based on OFDM modulation, RTC packets several subcarriers as subcontention unit and operates channel contention over multiple subcontention units. Furthermore, each subcontention unit maintains a transmission set, where collision-free links are allowed to merged into the transmission set In this case, the transmission set of subcontention unit can be aggregated after each contention period. When the subcontention unit i is the smallest index of non-empty subcontention unit, the transmission set of unit i will win the channel contention and transmitters of unit i will start to transmit in the following data transmission period. Analysis and simulation results show that RTC achieves a notable throughput gain over Back2f as high as 190% through simulation.

关键词： Receivers OFDM Transmitters Electronics packaging

来源：评论

学校读者我要写书评

暂无评论

Tag recommendation for open source software

引用

Frontiers of Computer science 2014年第1期8卷 69-82页

作者： Tao WANG Huaimin WANG Gang YIN Charles X. LING Xiao LI Peng ZOU National Laboratory for Parallel and Distributed Processing College of Computer Department of Computer Science The University of Western Ontario London N6A5B7 Canada Academy of Equipment Beijing 101400 China

Nowadays open source software becomes highly popular and is of great importance for most software engi- neering activities. To facilitate software organization and re- trieval, tagging is extensively used in open source communi- ties. However, finding the desired software through tags in these communities such as Freecode and ohloh is still chal- lenging because of tag insufficiency. In this paper, we propose TRG （tag recommendation based on semantic graph）, a novel approach to discovering and enriching tags of open source software. Firstly, we propose a semantic graph to model the semantic correlations between tags and the words in software descriptions. Then based on the graph, we design an effec- tive algorithm to recommend tags for software. With com- prehensive experiments on large-scale open source software datasets by comparing with several typical related works, we demonstrate the effectiveness and efficiency of our method in recommending proper tags.

关键词： open source software semantic graph tag rec-ommendation

来源：评论

学校读者我要写书评

暂无评论

Symmetric Non-negative Matrix Factorization Based Link Partition Method for Overlapping Community Detection

Symmetric Non-negative Matrix Factorization Based Link Parti...

引用

IEEE International Conference on Systems, Man and Cybernetics

作者： Xiang Zhang Naiyang Guan Wenju Zhang Xuhui Huang Shuyi Wu Zhigang Luo Science and Technology on Parallel and Distributed Processing Laboratory College of Computer National University of Defense Technology Hunan P.R. China Institute of Software College of Computer National University of Defense Technology Hunan P.R. China Department of Computer Science and Technology College of Computer National University of Defense Technology Hunan P.R. China

ISBN: (纸本)9781479986989

Partitioning links rather than nodes is effective in overlapping community detection (OCD) on complex networks. However, it consumes high CPU and memory overheads because the volume of links is huge especially when the network is rather complex. In this paper, we proposes a symmetric non-negative matrix factorization (SNMF) based link partition method called SNMF-Link to overcome this deficiency. In particular, SNMF-Link represents data in a lower-dimensional space spanned by the node-link incidence matrix. By solving a lighter SNMF problem, SNMF-Link learns the clustering indicators of each links. Since traditional multiplicative update rule (MUR) based optimization algorithm for SNMF suffers from slow convergence, we applied the augmented Lagrangian method (ALM) to efficiently optimize SNMF. Experimental results show that SNMF-Link is much more efficient than the representative clustering algorithms without reducing the OCD performance.

关键词： Symmetric matrices Convergence Optimization Partitioning algorithms Chlorine Complex networks Image edge detection

来源：评论

学校读者我要写书评

暂无评论

CRAWL: A Trace Routing Algorithm Based on Hybrid Two-Layer Topology

CRAWL: A Trace Routing Algorithm Based on Hybrid Two-Layer T...

引用

International Conference on Computer sciences and Applications (CSA)

作者： Li-Ming Zheng Xiao-Dong Tan Wei-Dong Sun Xiao-Dong Li Department of Electronics Technology Armed Police Officer Academy Chengdu China National Key Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha China Ministry of Scientific Research Armed Police Officer Academy Chengdu China

ISBN: (纸本)9781479999620

Data distribution is a key technology for resources convergence and sharing in distributed environment. To better meet the requirement for real time data distribution in the dynamic network, a trace routing algorithm named CRAWL based on the hybrid two-layered topology is put forward. The algorithm contains an overlay topology named CBDLO, upper of which consists of multiple distributed balanced binary trees corresponding to different properties and the lower of which is an unstructured topology. CRAWL forwards the data on the lower unstructured topology in the form of random walk, so that the data can be sent to the corresponding upper topology entry, It also includes a matching algorithm named CDM for the parallel matching data properties on the upper distributed and balanced binary tree and transmitting the matched data to the nodes that are interested in the data. The experimental results show that the algorithm can effectively support large-scale data distribution in a dynamical network, reduce distribution overhead and matching delays.

关键词： Topology Peer-to-peer computing Routing distributed databases Network topology Heuristic algorithms Binary trees

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：