检索结果-内蒙古大学图书馆

arXiv 2018年

作者： Shen, Junzhong Qiao, Yuran Huang, You Wen, Mei Zhang, Chunyuan College of Computer National University of Defense Technology Changsha410073 China National Key Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha410073 China

Large-scale floating-point matrix multiplication is a fundamental kernel in many scientific and engineering applications. Most existing work only focus on accelerating matrix multiplication on FPGA by adopting a linear systolic array. This paper towards the extension of this architecture by proposing a scalable and highly configurable multi-array architecture. In addition, we propose a work-stealing scheme to ensure the equality in the workload partition among multiple linear arrays. Furthermore, an analytical model is developed to determine the optimal design parameters. Experiments on a real-life convolutional neural network (CNN) show that we can obtain the optimal extension of the linear array architecture. Copyright © 2018, The Authors. All rights reserved.

关键词： Field programmable gate arrays (FPGA)

来源：评论

学校读者我要写书评

暂无评论

Sample dropout for audio scene classification using multi-scale dense connected convolutional neural network

arXiv

引用

arXiv 2018年

作者： Feng, Dawei Xu, Kele Mi, Haibo Liao, Feifan Zhou, Yan Science and Technology on Parallel and Distributed Laboratory School of Computer National University of Defense Technology Changsha410073 China School of Information and Communication National University of Defense Technology Wuhan430010 China

Acoustic scene classification is an intricate problem for a machine. As an emerging field of research, deep Convolutional Neural Networks (CNN) achieve convincing results. In this paper, we explore the use of multi-scale Dense connected convolutional neural network (DenseNet) for the classification task, with the goal to improve the clas-sification performance as multi-scale features can be extracted from the time-frequency representation of the audio signal. On the other hand, most of previous CNN-based audio scene classification approaches aim to improve the classification accuracy, by employing different regularization techniques, such as the dropout of hidden units and data augmentation, to reduce overfitting. It is widely known that outliers in the training set have a high negative inuence on the trained model, and culling the outliers may improve the classification performance, while it is often under-explored in previous studies. In this paper, inspired by the silence removal in the speech signal processing, a novel sample dropout approach is proposed, which aims to remove outliers in the training dataset. Using the DCASE 2017 audio scene classification datasets, the experimental results demonstrates the proposed multi-scale DenseNet providing a su-perior performance than the traditional single-scale DenseNet, while the sample dropout method can further improve the classification robustness of multi-scale DenseNet. Copyright © 2018, The Authors. All rights reserved.

关键词： Statistics

来源：评论

学校读者我要写书评

暂无评论

Efficient detection of dangling pointer error for C/C++ programs 2

Efficient detection of dangling pointer error for C/C++ prog...

引用

2nd Annual International Conference on Information System and Artificial Intelligence, ISAI 2017

作者： Zhang, Wenzhe Science and Technology on Parallel and Distributed Laboratory State Key Laboratory of High Performance Computing State Key Laboratory of High-end Server and Storage Technology College of Computer National University of Defense Technology Changsha China

Dangling pointer error is pervasive in C/C++ programs and it is very hard to detect. This paper introduces an efficient detector to detect dangling pointer error in C/C++ programs. By selectively leave some memory accesses unmonitored, our method could reduce the memory monitoring overhead and thus achieves better performance over previous methods. Experiments show that our method could achieve an average speed up of 9% over previous compiler instrumentation based method and more than 50% over previous page protection based method. © Published under licence by IOP Publishing Ltd.

关键词： Errors

来源：评论

学校读者我要写书评

暂无评论

Loss rank mining: A general hard example mining method for real-time Detectors

arXiv

引用

arXiv 2018年

作者： Yu, Hao Zhang, Zhaoning Qin, Zheng Wu, Hao Li, Dongsheng Zhao, Jun Lu, Xicheng Science and Technology on Parallel and Distributed Laboratory National University of Defense Technology Changsha China College of Electronic and Engineering National University of Defense Technology Changsha China College of Meteorology and Oceanology National University of Defense Technology Changsha China

Modern object detectors usually suffer from low accuracy issues, as foregrounds always drown in tons of backgrounds and become hard examples during training. Compared with those proposal-based ones, real-time detectors are in far more serious trouble since they renounce the use of regionproposing stage which is used to filter a majority of backgrounds for achieving real-time rates. Though foregrounds as hard examples are in urgent need of being mined from tons of backgrounds, a considerable number of state-of-the-art realtime detectors, like YOLO series, have yet to profit from existing hard example mining methods, as using these methods need detectors fit series of prerequisites. In this paper, we propose a general hard example mining method named Loss Rank Mining (LRM) to fill the gap. LRM is a general method for real-time detectors, as it utilizes the final feature map which exists in all real-time detectors to mine hard examples. By using LRM, some elements representing easy examples in final feature map are filtered and detectors are forced to concentrate on hard examples during training. Extensive experiments validate the effectiveness of our method. With our method, the improvements of YOLOv2 detector on auto-driving related dataset KITTI and more general dataset PASCAL VOC are over 5% and 2% mAP, respectively. In addition, LRM is the first hard example mining strategy which could fit YOLOv2 perfectly and make it better applied in series of real scenarios where both real-time rates and accurate detection are strongly demanded. Copyright © 2018, The Authors. All rights reserved.

关键词： Mining

来源：评论

学校读者我要写书评

暂无评论

Loss Rank Mining: A General Hard Example Mining Method for Real-time Detectors

Loss Rank Mining: A General Hard Example Mining Method for R...

引用

International Joint Conference on Neural Networks

作者： Hao Yu Zhaoning Zhang Zheng Qin Hao Wu Dongsheng Li Jun Zhao Xicheng Lu Science and Technology on Parallel and Distributed Laboratory National University of Defense Technology Changsha China College of Electronic and Engineering National University of Defense Technology Changsha China College of Meteorology and Oceanology National University of Defense Technology Changsha China

Modern object detectors usually suffer from low accuracy issues, as foregrounds always drown in tons of back-grounds and become hard examples during training. Compared with those proposal-based ones, real-time detectors are in far more serious trouble since they renounce the use of region-proposing stage which is used to filter a majority of back-grounds for achieving real-time rates. Though foregrounds as hard examples are in urgent need of being mined from tons of backgrounds, a considerable number of state-of-the-art real-time detectors, like YOLO series, have yet to profit from existing hard example mining methods, as using these methods need detectors fit series of prerequisites. In this paper, we propose a general hard example mining method named Loss Rank Mining (LRM) to fill the gap. LRM is a general method for real-time detectors, as it utilizes the final feature map which exists in all real-time detectors to mine hard examples. By using LRM, some elements representing easy examples in final feature map are filtered and detectors are forced to concentrate on hard examples during training. Extensive experiments validate the effectiveness of our method. With our method, the improvements of YOLOv2 detector on auto-driving related dataset KITTI and more general dataset PASCAL VOC are over 5% and 2% mAP, respectively. In addition, LRM is the first hard example mining strategy which could fit YOLOv2 perfectly and make it better applied in series of real scenarios where both real-time rates and accurate detection are strongly demanded.

关键词： Detectors Real-time systems Feature extraction Training Object detection Pipelines Task analysis

来源：评论

学校读者我要写书评

暂无评论

Collaborative deep learning across multiple data centers

arXiv

引用

arXiv 2018年

作者： Xu, Kele Mi, Haibo Feng, Dawei Wang, Huaimin Chen, Chuan Zheng, Zibin Lan, Xu National Key Laboratory of Parallel and Distributed Processing Changsha China College of Computer National University of Defense Technology Changsha China School of Data and Computer Science Sun Yat-Sen University Guangzhou China Queen Mary University of London London United Kingdom

Valuable training data is often owned by independent organizations and located in multiple data centers. Most deep learning approaches require to centralize the multi-datacenter data for performance purpose. In practice, however, it is often infeasible to transfer all data to a centralized data center due to not only bandwidth limitation but also the constraints of privacy regulations. Model averaging is a conventional choice for data parallelized training, but its ineffectiveness is claimed by previous studies as deep neural networks are often non-convex. In this paper, we argue that model averaging can be effective in the decentralized environment by using two strategies, namely, the cyclical learning rate and the increased number of epochs for local model training. With the two strategies, we show that model averaging can provide competitive performance in the decentralized mode compared to the data-centralized one. In a practical environment with multiple data centers, we conduct extensive experiments using state-of-the-art deep network architectures on different types of data. Results demonstrate the effectiveness and robustness of the proposed method. Copyright © 2018, The Authors. All rights reserved.

关键词： Network architecture

来源：评论

学校读者我要写书评

暂无评论

Correction to: Type 2 Diabetes with Artificial Intelligence Machine Learning: Methods and Evaluation

引用

Archives of Computational Methods in Engineering 2021年第7期28卷 5039-5039页

作者： Ismail, Leila Materwala, Huned Tayefi, Maryam Ngo, Phuong Karduck, Achim P. Intelligent Distributed Computing and Systems Research Laboratory Department of Computer Science and Software Engineering College of Information Technology United Arab Emirates University Al Ain Abu Dhabi United Arab Emirates National Water and Energy Center United Arab Emirates University Al Ain Abu Dhabi United Arab Emirates Norwegian Centre for E-Health Research Tromsø Norway Faculty of Informatics Furtwangen University Furtwangen Germany

来源：评论

学校读者我要写书评

暂无评论

Fine-grained checkpoint based on non-volatile memory

引用

Frontiers of Information technology & Electronic Engineering 2017年第2期18卷 220-234页

作者： Wen-zhe ZHANG Kai LU Mikel LUJAN Xiao-ping WANG Xu ZHOU Science and Technology on Parallel and Distributed Processing Laboratory College of Computer National University of Defense Technology Changsha 410072 China School of Computer The University of Manchester Manchester M13 9PL UK

New non-volatile memory （e.g., phase-change memory） provides fast access, large capacity, byteaddressability, and non-volatility features. These features, fast-byte-persistency, will bring new opportunities to fault tolerance. We propose a fine-grained checkpoint based on non-volatile memory. We extend the current virtual memory manager to manage non-volatile memory, and design a persistent heap with support for fast allocation and checkpointing of persistent objects. To achieve a fine-grained checkpoint, we scatter objects across virtual pages and rely on hardware page-protection to monitor the modifications. In our system, two objects in different virtual pages may reside on the same physical page. Modifying one object would not interfere with the other object. This allows us to monitor and checkpoint objects smaller than 4096 bytes in a fine-grained way. Compared with previous page-grained based checkpoint mechanisms, our new checkpoint method can greatly reduce the data copied at checkpoint time and better leverage the limited bandwidth of non-volatile memory.

关键词： Non-volatile memory Byte-persistency Persistent heap Fine-grained checkpoint

来源：评论

学校读者我要写书评

暂无评论

Determinants of pull-based development in the context of continuous integration

引用

science China(Information sciences) 2016年第8期59卷 53-66页

作者： Yue YU Gang YIN Tao WANG Cheng YANG Huaimin WANG College of Computer National University of Defense Technology National Laboratory for Parallel and Distributed Processing

The pull-based development model, widely used in distributed software teams on open source communities, can efficiently gather the wisdom from crowds. Instead of sharing access to a central repository,contributors create a fork, update it locally, and request to have their changes merged back, i.e., submit a pull-request. On the one hand, this model lowers the barrier to entry for potential contributors since anyone can submit pull-requests to any repository, but on the other hand it also increases the burden on integrators, who are responsible for assessing the proposed patches and integrating the suitable changes into the central repository. The role of integrators in pull-based development is crucial. They must not only ensure that pull-requests should meet the project’s quality standards before being accepted, but also finish the evaluations in a timely manner. To keep up with the volume of incoming pull-requests, continuous integration(CI) is widely adopted to automatically build and test every pull-request at the time of submission. CI provides extra evidences relating to the quality of pull-requests, which would help integrators to make final decision(i.e., accept or reject). In this paper, we present a quantitative study that tries to discover which factors affect the process of pull-based development model, including acceptance and latency in the context of CI. Using regression modeling on data extracted from a sample of Git Hub projects deploying the Travis-CI service, we find that the evaluation process is a complex issue, requiring many independent variables to explain adequately. In particular, CI is a dominant factor for the process, which not only has a great influence on the evaluation process per se, but also changes the effects of some traditional predictors.

关键词： pull-request continuous integration Git Hub distributed software development empirical analysis

来源：评论

学校读者我要写书评

暂无评论

Crowd intelligence in AI 2.0 era

引用

Frontiers of Information technology & Electronic Engineering 2017年第1期18卷 15-43页

作者： Wei LI Wen-jun WU Huai-min WANG Xue-qi CHENG Hua-jun CHEN Zhi-hua ZHOU Rong DING State Key Laboratory of Software Development Beihang University National Laboratory for Parallel and Distributed Processing College of ComputerNational University of Defense Technology Institute of Computing Technology Chinese Academy of Sciences College of Computer Science and Technology Zhejiang University National Key Laboratory for Novel Software Technology Nanjing University

The Internet based cyber-physical world has profoundly changed the information environment for the development of artificial intelligence(AI), bringing a new wave of AI research and promoting it into the new era of AI 2.0. As one of the most prominent characteristics of research in AI 2.0 era, crowd intelligence has attracted much attention from both industry and research communities. Specifically, crowd intelligence provides a novel problem-solving paradigm through gathering the intelligence of crowds to address challenges. In particular, due to the rapid development of the sharing economy, crowd intelligence not only becomes a new approach to solving scientific challenges, but has also been integrated into all kinds of application scenarios in daily life, e.g., online-tooffline(O2O) application, real-time traffic monitoring, and logistics management. In this paper, we survey existing studies of crowd intelligence. First, we describe the concept of crowd intelligence, and explain its relationship to the existing related concepts, e.g., crowdsourcing and human computation. Then, we introduce four categories of representative crowd intelligence platforms. We summarize three core research problems and the state-of-the-art techniques of crowd intelligence. Finally, we discuss promising future research directions of crowd intelligence.

关键词： Crowd intelligence Artificial intelligence 2.0 Crowdsourcing Human computation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：