检索结果-内蒙古大学图书馆

A GPU-based parallel WFST decoder on nnet3

AIP Conference Proceedings 2019年第1期2073卷

作者： Yong Wang Jie Liu Chen Zhou Zhengbin Pang Shengguo Li Chunye Gong Xinbiao Gan Yurong Li 1Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China

One performance-intensive part of automatic speech recognition is the weighted finite-state transducer (WFST) decoding. To solve the problem, we expand parallel Graphics processing Units (GPU) computing to the decoding period. We describe extension work based on Kaldi toolkit for speech recognition research. Our work can support weighted finite-state transducer decoding on Kaldi neural nets with CUDA toolkit. Our paper also expands an efficient parallel Viterbi beam decoding algorithm to decrease the speech recognition Real Time Factor (RTF) value. Together with our optimization algorithm, we have reached 2.3x speed up on the AISHELL corpus decoding. We also implement nnet3 decoder that improves real-time speed up with no word error rate raise.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Corrigendum to “A distributed Relation Detection Approach in the Internet of Things”

引用

Mobile Information Systems 2019年第1期2019卷

作者： Weiping Zhu Hongliang Lu Xiaohui Cui Jiannong Cao International School of Software Wuhan University Wuhan *** Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha *** Department of Computing Hong Kong Polytechnic University Kowloon Hong Kongpolyu.edu.hk

来源：评论

学校读者我要写书评

暂无评论

Predicting potential gene ontology from cellular response data 17

Predicting potential gene ontology from cellular response da...

引用

5th International Conference on Bioinformatics and Computational Biology, ICBCB 2017

作者： Hong, Hao Yin, Xiaoyao Li, Fei Guan, Naiyang Bo, Xiaochen Luo, Zhigang Department of Chemistry and Biology National University of Defense Technology Changsha China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China Department of Biotechnology Beijing Institute of Radiation Medicine Beijing China

ISBN: (纸本)9781450348270

Ontologies have proven to be useful for capturing and organizing knowledge as a hierarchical set of terms and their relationships. However, curating gene ontology data by hand requires specialized knowledge of certain field, which is inefficient. Thus inferring gene ontology from the exponentially increased biological data is getting hot. Based on the Library of Integrated Network-Based Cellular Signatures (LINCS) data we came up with the hypothesis that genes participate in analogous biological processes might affect cells accordantly. By assessing cellular response after genes were knock out we built a similarity matrix with the Gene Set Enrichment Analysis (GSEA) and clustered the genes with affinity propagation algorithm. Next we mapped the cluster result to gene ontology biological process data for annotation and enrichment analysis, which confirmed our hypothesis and made it possible to predict biological processes for unannotated genes from cellular response data after genes are knock out for the first time. We further validated the rationality from the gene ontology molecular function data. © 2017 ACM.

关键词： Gene Ontology

来源：评论

学校读者我要写书评

暂无评论

Crowd intelligence in AI 2.0 era

引用

Frontiers of Information technology & Electronic Engineering 2017年第1期18卷 15-43页

作者： Wei LI Wen-jun WU Huai-min WANG Xue-qi CHENG Hua-jun CHEN Zhi-hua ZHOU Rong DING State Key Laboratory of Software Development Beihang University National Laboratory for Parallel and Distributed Processing College of ComputerNational University of Defense Technology Institute of Computing Technology Chinese Academy of Sciences College of Computer Science and Technology Zhejiang University National Key Laboratory for Novel Software Technology Nanjing University

The Internet based cyber-physical world has profoundly changed the information environment for the development of artificial intelligence(AI), bringing a new wave of AI research and promoting it into the new era of AI 2.0. As one of the most prominent characteristics of research in AI 2.0 era, crowd intelligence has attracted much attention from both industry and research communities. Specifically, crowd intelligence provides a novel problem-solving paradigm through gathering the intelligence of crowds to address challenges. In particular, due to the rapid development of the sharing economy, crowd intelligence not only becomes a new approach to solving scientific challenges, but has also been integrated into all kinds of application scenarios in daily life, e.g., online-tooffline(O2O) application, real-time traffic monitoring, and logistics management. In this paper, we survey existing studies of crowd intelligence. First, we describe the concept of crowd intelligence, and explain its relationship to the existing related concepts, e.g., crowdsourcing and human computation. Then, we introduce four categories of representative crowd intelligence platforms. We summarize three core research problems and the state-of-the-art techniques of crowd intelligence. Finally, we discuss promising future research directions of crowd intelligence.

关键词： Crowd intelligence Artificial intelligence 2.0 Crowdsourcing Human computation

来源：评论

学校读者我要写书评

暂无评论

Fine-grained checkpoint based on non-volatile memory

引用

Frontiers of Information technology & Electronic Engineering 2017年第2期18卷 220-234页

作者： Wen-zhe ZHANG Kai LU Mikel LUJAN Xiao-ping WANG Xu ZHOU Science and Technology on Parallel and Distributed Processing Laboratory College of Computer National University of Defense Technology Changsha 410072 China School of Computer The University of Manchester Manchester M13 9PL UK

New non-volatile memory （e.g., phase-change memory） provides fast access, large capacity, byteaddressability, and non-volatility features. These features, fast-byte-persistency, will bring new opportunities to fault tolerance. We propose a fine-grained checkpoint based on non-volatile memory. We extend the current virtual memory manager to manage non-volatile memory, and design a persistent heap with support for fast allocation and checkpointing of persistent objects. To achieve a fine-grained checkpoint, we scatter objects across virtual pages and rely on hardware page-protection to monitor the modifications. In our system, two objects in different virtual pages may reside on the same physical page. Modifying one object would not interfere with the other object. This allows us to monitor and checkpoint objects smaller than 4096 bytes in a fine-grained way. Compared with previous page-grained based checkpoint mechanisms, our new checkpoint method can greatly reduce the data copied at checkpoint time and better leverage the limited bandwidth of non-volatile memory.

关键词： Non-volatile memory Byte-persistency Persistent heap Fine-grained checkpoint

来源：评论

学校读者我要写书评

暂无评论

Collaborative deep learning across multiple data centers

arXiv

引用

arXiv 2018年

作者： Xu, Kele Mi, Haibo Feng, Dawei Wang, Huaimin Chen, Chuan Zheng, Zibin Lan, Xu National Key Laboratory of Parallel and Distributed Processing Changsha China College of Computer National University of Defense Technology Changsha China School of Data and Computer Science Sun Yat-Sen University Guangzhou China Queen Mary University of London London United Kingdom

Valuable training data is often owned by independent organizations and located in multiple data centers. Most deep learning approaches require to centralize the multi-datacenter data for performance purpose. In practice, however, it is often infeasible to transfer all data to a centralized data center due to not only bandwidth limitation but also the constraints of privacy regulations. Model averaging is a conventional choice for data parallelized training, but its ineffectiveness is claimed by previous studies as deep neural networks are often non-convex. In this paper, we argue that model averaging can be effective in the decentralized environment by using two strategies, namely, the cyclical learning rate and the increased number of epochs for local model training. With the two strategies, we show that model averaging can provide competitive performance in the decentralized mode compared to the data-centralized one. In a practical environment with multiple data centers, we conduct extensive experiments using state-of-the-art deep network architectures on different types of data. Results demonstrate the effectiveness and robustness of the proposed method. Copyright © 2018, The Authors. All rights reserved.

关键词： Network architecture

来源：评论

学校读者我要写书评

暂无评论

A Scalable and Flexible Monitoring System Framework for Supercomputers

A Scalable and Flexible Monitoring System Framework for Supe...

引用

2017 International Conference on Computer, Electronics and Communication Engineering(CECE2017)

作者： Tong XIAO Kai LU College of Computer National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology

Mankind's demand for more powerful computing capabilities is never met, which has led to the continuous improvement of supercomputers' performance. A more powerful supercomputer tends to have a larger system scale, which brings serious challenges to the system management, within which how to monitor the system's state is a critical problem. To address this problem, a scalable and flexible monitoring system framework for supercomputers is brought forward in this paper which can monitor supercomputers with tens of thousands of nodes effectively and efficiently. In this paper, we firstly give an overview of the framework and then focus on the Super Computer System Description Language(SCSDL) which is key to the framework. In the end, we explain some techniques about implementing the framework, and the client GUIs of a job monitoring system and an error monitoring system for Tianhe-2 based on this framework are given, from which we can see that the framework is well scalable and flexible to monitor Tianhe-2 which has 16,000 nodes effectively and efficiently.

关键词： Monitoring system Framework Supercomputer Scalable Flexible

来源：评论

学校读者我要写书评

暂无评论

Image Annotation by Object Hypotheses-oriented Deep Neural Networks

Image Annotation by Object Hypotheses-oriented Deep Neural N...

引用

2017 2nd International Conference on Software, Multimedia and Communication Engineering（SMCE 2017)

作者： Fang MA Shao-he LV Ke-xin ZHENG Chi JIN Fei CHEN Ke YANG and Yong DOU National Laboratory for Parallel and Distributed Processing National University of Defense Technology University of South China School of Computer Science and Technology

Image annotation generates a set of semantic labels that describe the contents of an input *** deep learning techniques have achieved significant success in many areas of image *** this paper,we present a multi-label image annotation method that combines unsupervised object hypotheses generation and deep neural *** an image,object hypotheses are generated in an unsupervised *** we extract the image features for each hypothesis with a deep neural network *** combining the features of all hypotheses,we get the features of the entire ***,we calculate for each label the probability of that the label is correlated with the given *** can be trained in an end-to-end way using the standard backward propagation *** results on multiple benchmark datasets show that our method is better than the state-of-the-art ones.

关键词： Deep learning Multi-label annotation Object hypotheses

来源：评论

学校读者我要写书评

暂无评论

Detailed and clock-driven simulation for HPC interconnection network

引用

Frontiers of Computer science 2016年第5期10卷 797-811页

作者： Wenhao ZHOU Juan CHEN Chen CUI Qian WANG Dezun DONG Yuhua TANG State Key Laboratory of High Performance Computing School of Computer National University of Defense Technology Changsha 410073 China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China

Performance and energy consumption of high performance computing （HPC） interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation plat- form is very important for the research on HPC software and hardware technologies. To effectively evaluate the per- formance and energy consumption of HPC interconnection networks, this article designs and implements a detailed and clock-driven HPC interconnection network simulation plat- form, called HPC-NetSim. HPC-NetSim uses application- driven workloads and inherits the characteristics of the de- tailed and flexible cycle-accurate network simulator. Besides, it offers a large set of configurable network parameters in terms of topology and routing, and supports router＇s on/off states. We compare the simulated execution time with the real execution time of Tianhe-2 subsystem and the mean error is only 2.7%. In addition, we simulate the network behaviors with different network structures and low-power modes. The results are also consistent with the theoretical analyses.

关键词： high performance computing clock-driven sim-ulation interconnection network BookSim

来源：评论

学校读者我要写书评

暂无评论

Optimizing guest swapping using elastic and transparent memory provisioning on virtualization platform

引用

Frontiers of Computer science 2016年第5期10卷 908-924页

作者： Xi LI Pengfei ZHANG Rui CHU Huaimin WANG School of Information Science and Engineering Central South University Changsha 410083 China National Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha 410008 China

On virtualization platforms, peak memory de- mand caused by hotspot applications often triggers page swapping in guest OS, causing performance degradation in- side and outside of this virtual machine （VM）. Even though host holds sufficient memory pages, guest OS is unable to utilize free pages in host directly due to the semantic gap between virtual machine monitor （MM） and guest operat- ing system （OS）. Our work aims at utilizing the free memory scattered in multiple hosts in a virtualization environment to improve the performance of guest swapping in a transparent and implicit way. Based on the insightful analysis of behav- ioral characteristics of guest swapping, we design and im- plement a distributed and scalable framework HybridSwap. It dynamically constructs virtual swap pools using various policies, and builds up a synthetic swapping mechanism in a peer-to-peer way, which can adaptively choose different vir- tual swap pools. We implement the prototype of HybridSwap and evaluate it with some benchmarks in different scenar- ios. The evaluation results demonstrate that our solution has the ability to promote the guest swapping efficiency indeed and shows a double performance promotion in some cases. Even in the worst case, the system overhead brought by Hy- bridSwap is acceptable.

关键词： virtualization memory management guestswapping performance degradation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：