检索结果-内蒙古大学图书馆

21st IEEE International Conference on High Performance Computing and Communications, 17th IEEE International Conference on Smart City and 5th IEEE International Conference on Data science and Systems, HPCC/SmartCity/DSS 2019

作者： Tang, Anyao Wu, Chengkun Liu, Jie Wang, Wei Yang, Xi Xing, Yuting Science and Technology on Parallel and Distributed Processing Laboratory Laboratory of Software Engineering for Complex Systems National University of Defense Technology Changsha410073 China State Key Laboratory of High Performance Computing College of Computer National University of Defense Technology Changsha410073 China College of Computer National University of Defense Technology Changsha 410073 China

ISBN: (纸本)9781728120584

Author name disambiguation (AND) is an important task in the field of scientific data mining. It has become a great challenge with the rapid growth of academic digital libraries. The task of AND for a large number of authors is computationally intensive. In particular, an author's name in MEDLINE is represented by full last name and initials, like 'Zhang S', which leads to a lot of identical strings that actually represent different names. In this paper, we proposed an efficient algorithm for parallel AND computation. The proposed algorithm mainly addresses the load balancing issue across many computing nodes. It involves the following strategies:(1) Author-based load balancing, which splits the computation load for each core by author name labels. (2) Matrix-based strategy, which calculates the pairwise similarity between publications and saves them in a matrix globally shared by all processes. Then group them by width-first search. We combine the above two strategies, the second of which is used to calculate authors with a large number of documents, and the other authors apply the first. We constructed a publications database written by Chinese authors from MEDLINE, the biggest public database for biomedical literature (abstracts). For benchmark testing, we experimented our algorithm with a dataset of 1 million publications on the Tianhe-2A supercomputer. Firstly, we trained an AND classifier that can achieve 98.1% of F1. The serial computation time is estimated to be approximately 246 hours, while the parallel execution time is approximately 66 hours in the case of four cores on a single node (with a speedup of 3.7x). Finally, we reduced the total parallel computing time of 1 million documents to about 2 hours and achieved 65.8% of parallelism efficiency using 200 cores on 90 nodes. © 2019 IEEE.

关键词： Matrix algebra

来源：评论

学校读者我要写书评

暂无评论

A Heterogeneous Processor Design for CNN-Based AI Applications on IoT Devices

引用

Procedia Computer science 2020年 174卷 2-8页

作者： Zhiqiang Liu Jingfei Jiang Guoqing Lei Kai Chen Buyue Qin Xiaoqiang Zhao Artificial Intelligence Research Center National Innovation Institute of Defense Technology Beijing China National Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha China College of Computer National University of Defense Technology Changsha China College of System Engineering National University of Defense Technology Changsha China

This paper proposes a heterogeneous processor design for CNN-based AI applications on IoT devices. The heterogeneous processor contains an embedded RISC-V CPU that works as a general processor and an efficient CNN-accelerator that supports a variety of CNN models with a list of macro instructions. For demonstration, we implement a prototype on an FPGA platform with the RISC-V CPU working under 20 MHz and the CNN accelerator working under 100 MHz. As a case study, we run a CNN-based face detection and recognition application on this prototype. The prototype can process one image in 0.72 seconds and an ASIC implementation working under 400 MHz can process one image in less than 0.15 seconds by estimation, which can satisfy the needs for many IoT scenarios such as access control systems and check-in systems.

关键词： Processor RISC-V Accelerator AI (Artificial Intelligence) CNN (Convolutional Neural Network) IoT (Internet of Things)

来源：评论

学校读者我要写书评

暂无评论

Learning generic diffusion processes for image restoration 29

Learning generic diffusion processes for image restoration

引用

29th British Machine Vision Conference, BMVC 2018

作者： Qiao, Peng Dou, Yong Chen, Yunjin Feng, Wensen Science and Technology on Parallel and Distributed Laboratory National University of Defense Technology Changsha China ULSee Inc. Hangzhou China College of Computer Science and Software Engineering Shenzhen University Shenzhen China

Image restoration problems are typical ill-posed problems where the regularization term plays an important role. The regularization term learned via generative approaches is easy to transfer to various image restoration, but offers inferior restoration quality compared with that learned via discriminative approaches. On the contrary, the regularization term learned via discriminative approaches are usually trained for a specific image restoration problem, and fail in the problem for which it is not trained. To address this issue, we propose a generic diffusion process (genericDP) to handle multiple Gaussian denoising problems based on the Trainable Non-linear Reaction Diffusion (TNRD) models. Instead of one model, which consists of a diffusion and a reaction term, for one Gaussian denoising problem in TNRD, we enforce multiple TNRD models to share one diffusion term. The trained genericDP model can provide both promising denoising performance and high training efficiency compared with the original TNRD models. We also transfer the trained diffusion term to non-blind deconvolution which is unseen in the training phase. Experiment results show that the trained diffusion term for multiple Gaussian denoising can be transferred to image non-blind deconvolution as an image prior and provide competitive performance. © 2018. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms.

关键词： Diffusion

来源：评论

学校读者我要写书评

暂无评论

Variational Distillation for Multi-View Learning

arXiv

引用

arXiv 2022年

作者： Tian, Xudong Zhang, Zhizhong Wang, Cong Zhang, Wensheng Qu, Yanyun Ma, Lizhuang Wu, Zongze Xie, Yuan Tao, Dacheng School of Computer Science and Technology East China Normal University Shanghai200062 China The Distributed and Parallel Software Laboratory 2012 Labs Huawei Technologies Hangzhou China Institute of Automation Chinese Academy of Sciences Beijing100190 China School of Information Science and Technology Xiamen University Fujian361005 China School of Computer Science and Techology East China Normal University Shanghai China The School of Electronic Information and Electrical Engineering Shanghai Jiao Tong University China College of Mechatronics and Control Engineering Shenzhen University Shenzhen China JD Exploer Academy China The University of Sydney Australia

Information Bottleneck (IB) based multi-view learning provides an information theoretic principle for seeking shared information contained in heterogeneous data descriptions. However, its great success is generally attributed to estimate the multivariate mutual information which is intractable when the network becomes complicated. Moreover, the representation learning tradeoff, i.e., prediction-compression and sufficiency-consistency tradeoff, makes the IB hard to satisfy both requirements simultaneously. In this paper, we design several variational information bottlenecks to exploit two key characteristics (i.e., sufficiency and consistency) for multi-view representation learning. Specifically, we propose a Multi-View Variational Distillation (MV2D) strategy to provide a scalable, flexible and analytical solution to fitting MI by giving arbitrary input of viewpoints but without explicitly estimating it. Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels, producing predictive and compact representations naturally. Also, our information-theoretic constraint can effectively neutralize the sensitivity to heterogeneous data by eliminating both task-irrelevant and view-specific information, preventing both tradeoffs in multiple view cases. To verify our theoretically grounded strategies, we apply our approaches to various benchmarks under three different applications. Extensive experiments to quantitatively and qualitatively demonstrate the effectiveness of our approach against state-of-the-art methods. Copyright © 2022, The Authors. All rights reserved.

关键词： Distillation

来源：评论

学校读者我要写书评

暂无评论

Software Effective Evaluating technology: SWEET

Software Effective Evaluating Technology: SWEET

引用

IEEE International Conference on Software Engineering and Service sciences (ICSESS)

作者： Yaozong Li Tao Wang Yue Yu Dongyang Hu National Laboratory for Parallel and Distributed Processing National University of Defence Technology Changsha China

ISBN: (数字)9781728109459

ISBN: (纸本)9781728109466

In open source community, there are a large number of software resources existing. Such software resources distribute in different societies or storehouses, which require different software characteristic. This phenomenon results in difficult to evaluate software quality using traditional methods. In this case, a novel open-source software sorting algorithm may be an effective solution. Considering both subjective and objective levels, we propose a new method on software sorting and retrieving. In the subjective level, metrics are selected from the corresponding collaborative development community based on software topic. In the objective level, metrics are obtained from the group emotional evaluation value of the knowledge sharing community. We have proved the effectiveness of the method through comparison experiments. Combining with the Solrcloud tool, this method has been integrated into the OSSEAN platform.

关键词： Open source software Sorting Collaboration Software algorithms Measurement Tools

来源：评论

学校读者我要写书评

暂无评论

Load Balancing a Multi-Block Grids-based Application on Heterogeneous Platform

Load Balancing a Multi-Block Grids-based Application on Hete...

引用

IEEE International Conference on Computational science and Engineering, CSE

作者： Yonggang Che Chuanfu Xu Zhenghua Wang Institute for Quantum Information & State Key Lab. of High Performance Computing College of Computer National University of Defense Technology Changsha China Science and Technology on Parallel and Distributed Processing Lab National University of Defense Technology Changsha China

ISBN: (数字)9781665403986

ISBN: (纸本)9781665403993

This paper presents a load balancing method for a multi-block grids-based CFD (Computational Fluid Dynamics) application on heterogeneous platform. This method includes an asymmetric task scheduling scheme and a load balancing model. The idea is to balance the computing speed between the CPU and the coprocessor by adjusting the workload and the numbers of threads on both sides. Optimal load balance parameters are empirically selected, guided by a performance model. Performance evaluation is conducted on a computer server consists of two Intel Xeon E5-2670 v3 CPUs and two MIC coprocessors (Xeon Phi 5110P and Xeon Phi 7120P) for the simulation of turbulent combustion in a supersonic combustor. The results show that the performance is highly sensitive to the load balance parameters. With the optimal parameters, the heterogeneous computing achieves a maximum speedup of 2.30 × for a 6-block mesh, and a maximum speedup of 2.66 × for a 8-block mesh, over the CPU-only computing.

关键词： Computational modeling Computational fluid dynamics Load management Servers Task analysis Load modeling Coprocessors

来源：评论

学校读者我要写书评

暂无评论

An Efficient High-Precision Data Detection for Massive MU-MIMO Systems

An Efficient High-Precision Data Detection for Massive MU-MI...

引用

International Conference on Electronic Engineering and Informatics (EEI)

作者： Shikai Qiu Yu Wang Lirui Chen Zuocheng Xing National Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha Hunan

ISBN: (纸本)9781728140773

Data detection is among the most crucial process task for massive multi-user (MU) multiple-input multiple-output (MIMO) wireless systems. In this letter, we propose a novel efficient high precision soft-output data detection algorithm, which iteratively generates a signal vector and reduces its complexity meanwhile. This algorithm guarantees its significant error-rate performance and reduced complexity by combing an optimization method, ADMM, and modified neighborhood search algorithm. Simulation results demonstrate that the proposed detection achieves superior performance over the existing methods at low computational complexity.

关键词： computational complexity MIMO communication multi-access systems optimisation search problems signal detection Multi access systems MIMO communication tabu search complexity classes signal acquisition data detection multiple input multiple output systems

来源：评论

学校读者我要写书评

暂无评论

Algorithm and architecture for path metric aided bit-flipping decoding of polar codes

arXiv

引用

arXiv 2019年

作者： Wang, Yu Chen, Lirui Wang, Qinglin Zhang, Yang Xing, Zuocheng National Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha China

Polar codes attract more and more attention of researchers in recent years, since its capacity achieving property. However, their error-correction performance under successive cancellation (SC) decoding is inferior to other modern channel codes at short or moderate blocklengths. SC-Flip (SCF) decoding algorithm shows higher performance than SC decoding by identifying possibly erroneous decisions made in initial SC decoding and flipping them in the sequential decoding attempts. However, it performs not well when there are more than one erroneous decisions in a codeword. In this paper, we propose a path metric aided bit-flipping decoding algorithm to identify and correct more errors efficiently. In this algorithm, the bit-flipping list is generated based on both log likelihood ratio (LLR) based path metric and bit-flipping metric. The path metric is used to verify the effectiveness of bit-flipping. In order to reduce the decoding latency and computational complexity, its corresponding pipeline architecture is designed. By applying these decoding algorithm and pipeline architecture, an improvement on error-correction performance can be got up to 0.25dB compared with SCF decoding at frame error rate of 10−4, with low average decoding latency. Copyright © 2019, The Authors. All rights reserved.

关键词： Decoding

来源：评论

学校读者我要写书评

暂无评论

Versionized process based on non-volatile random-access memory for fine-grained fault tolerance

引用

Frontiers of Information technology & Electronic Engineering 2018年第2期19卷 192-205页

作者： Wen-zhe ZHANG Kai LU Xiao-ping WANG Science and Technology on Parallel and Distributed Processing Laboratory College of ComputerNational University of Defense Technology

Non-volatile random-access memory（NVRAM） technology is maturing rapidly and its byte-persistence feature allows the design of new and efficient fault tolerance mechanisms. In this paper we propose the versionized process（Ver P）, a new process model based on NVRAM that is natively non-volatile and fault tolerant. We introduce an intermediate software layer that allows us to run a process directly on NVRAM and to put all the process states into NVRAM, and then propose a mechanism to versionize all the process data. Each piece of the process data is given a special version number, which increases with the modification of that piece of data. The version number can effectively help us trace the modification of any data and recover it to a consistent state after a system *** with traditional checkpoint methods, our work can achieve fine-grained fault tolerance at very little cost.

关键词： Non-volatile memory Byte-persistence Versionized process Version number

来源：评论

学校读者我要写书评

暂无评论

distributed sparse bundle adjustment algorithm based on three-dimensional point partition and asynchronous communication

引用

Frontiers of Information technology & Electronic Engineering 2018年第7期19卷 889-904页

作者： Xiao-long SHEN Yong DOU Steven MILLS David M EYERS Huan FENG Zhiyi HUANG College of Computer National University of Defense Technology Science and Technology on Parallel and Distributed Laboratory National University of Defense Technology Department of Computer Science University of Otago Department of Computer Science Tsinghua University

Sparse bundle adjustment(SBA) is a key but time-and memory-consuming step in three-dimensional(3 D) reconstruction. In this paper, we propose a 3 D point-based distributed SBA algorithm(DSBA) to improve the speed and scalability of SBA. The algorithm uses an asynchronously distributed sparse bundle adjustment(A-DSBA)to overlap data communication with equation computation. Compared with the synchronous DSBA mechanism(SDSBA), A-DSBA reduces the running time by 46%. The experimental results on several 3 D reconstruction datasets reveal that our distributed algorithm running on eight nodes is up to five times faster than that of the stand-alone parallel SBA. Furthermore, the speedup of the proposed algorithm(running on eight nodes with 48 cores) is up to41 times that of the serial SBA(running on a single node).

关键词： Sparse bundle adjustment parallel distributed sparse bundle adjustment Three-dimensional reconstruction Asynchronous

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：