检索结果-内蒙古大学图书馆

Versionized process based on non-volatile random-access memory for fine-grained fault tolerance

Frontiers of Information technology & Electronic Engineering 2018年第2期19卷 192-205页

作者： Wen-zhe ZHANG Kai LU Xiao-ping WANG Science and Technology on Parallel and Distributed Processing Laboratory College of ComputerNational University of Defense Technology

Non-volatile random-access memory（NVRAM） technology is maturing rapidly and its byte-persistence feature allows the design of new and efficient fault tolerance mechanisms. In this paper we propose the versionized process（Ver P）, a new process model based on NVRAM that is natively non-volatile and fault tolerant. We introduce an intermediate software layer that allows us to run a process directly on NVRAM and to put all the process states into NVRAM, and then propose a mechanism to versionize all the process data. Each piece of the process data is given a special version number, which increases with the modification of that piece of data. The version number can effectively help us trace the modification of any data and recover it to a consistent state after a system *** with traditional checkpoint methods, our work can achieve fine-grained fault tolerance at very little cost.

关键词： Non-volatile memory Byte-persistence Versionized process Version number

来源：评论

学校读者我要写书评

暂无评论

distributed sparse bundle adjustment algorithm based on three-dimensional point partition and asynchronous communication

引用

Frontiers of Information technology & Electronic Engineering 2018年第7期19卷 889-904页

作者： Xiao-long SHEN Yong DOU Steven MILLS David M EYERS Huan FENG Zhiyi HUANG College of Computer National University of Defense Technology Science and Technology on Parallel and Distributed Laboratory National University of Defense Technology Department of Computer Science University of Otago Department of Computer Science Tsinghua University

Sparse bundle adjustment(SBA) is a key but time-and memory-consuming step in three-dimensional(3 D) reconstruction. In this paper, we propose a 3 D point-based distributed SBA algorithm(DSBA) to improve the speed and scalability of SBA. The algorithm uses an asynchronously distributed sparse bundle adjustment(A-DSBA)to overlap data communication with equation computation. Compared with the synchronous DSBA mechanism(SDSBA), A-DSBA reduces the running time by 46%. The experimental results on several 3 D reconstruction datasets reveal that our distributed algorithm running on eight nodes is up to five times faster than that of the stand-alone parallel SBA. Furthermore, the speedup of the proposed algorithm(running on eight nodes with 48 cores) is up to41 times that of the serial SBA(running on a single node).

关键词： Sparse bundle adjustment parallel distributed sparse bundle adjustment Three-dimensional reconstruction Asynchronous

来源：评论

学校读者我要写书评

暂无评论

Locality-sensitive sketching for resilient network flow monitoring

arXiv

引用

arXiv 2019年

作者： Fu, Yongquan Li, Dongsheng Shen, Siqi Zhang, Yiming Chen, Kai Science and Technology Laboratory of Parallel and Distributed Processing College of Computer Science National University of Defense Technology SING Lab Hong Kong University of Science and Technology

Network monitoring is vital in modern clouds and data center networks for traffic engineering, network diagnosis, network intrusion detection, which need diverse traffic statistics ranging fromflow size distributions to heavy hitters. To cope with increasing network rates and massive traffic volumes, sketch based approximate measurement has been extensively studied to trade the accuracy for memory and computation cost, which unfortunately, is sensitive to hash collisions. In addition, deploying the sketch involves fine-grained performance control and instrumentation. This paper presents a locality-sensitive sketch (LSS) to be resilient to hash collisions. LSS proactively minimizes the estimation error due to hash collisions with an autoencoder based optimization model, and reduces the estimation variance by keeping similar network flows to the same bucket array. To illustrate the feasibility of the sketch, we develop a disaggregatedmonitoring application that supports non-intrusive sketching deployment and native network-wide analysis. Testbed shows that the framework adapts to line rates and provides accurate query results. Real-world trace-driven simulations show that LSS remains stable performance underwide ranges of parameters and dramatically outperforms state-of-the-art sketching structures, with over 103 to 105 times reduction in relative errors for per-flow queries as the ratio of the number of buckets to the number of network flows reduces from 10% to 0.1%. Copyright © 2019, The Authors. All rights reserved.

关键词： Intrusion detection

来源：评论

学校读者我要写书评

暂无评论

CWLP:coordinated warp scheduling and locality-protected cache allocation on GPUs

引用

Frontiers of Information technology & Electronic Engineering 2018年第2期19卷 206-220页

作者： Yang ZHANG Zuo-cheng XING Cang LIU Chuan TANG National Laboratory for Parallel and Distributed Processing National University of Defense Technology

As we approach the exascale era in supercomputing, designing a balanced computer system with a powerful computing ability and low power requirements has becoming increasingly important. The graphics processing unit（GPU） is an accelerator used widely in most of recent supercomputers. It adopts a large number of threads to hide a long latency with a high energy efficiency. In contrast to their powerful computing ability, GPUs have only a few megabytes of fast on-chip memory storage per streaming multiprocessor（SM）. The GPU cache is inefficient due to a mismatch between the throughput-oriented execution model and cache hierarchy design. At the same time, current GPUs fail to handle burst-mode long-access latency due to GPU＇s poor warp scheduling ***, benefits of GPU＇s high computing ability are reduced dramatically by the poor cache management and warp scheduling methods, which limit the system performance and energy efficiency. In this paper, we put forward a coordinated warp scheduling and locality-protected（CWLP） cache allocation scheme to make full use of data locality and hide latency. We first present a locality-protected cache allocation method based on the instruction program counter（LPC） to promote cache performance. Specifically, we use a PC-based locality detector to collect the reuse information of each cache line and employ a prioritised cache allocation unit（PCAU） which coordinates the data reuse information with the time-stamp information to evict the lines with the least reuse possibility. Moreover, the locality information is used by the warp scheduler to create an intelligent warp reordering scheme to capture locality and hide latency. Simulation results show that CWLP provides a speedup up to 19.8% and an average improvement of 8.8% over the baseline methods.

关键词： Locality Graphics processing unit （GPU） Cache allocation Warp scheduling

来源：评论

学校读者我要写书评

暂无评论

Towards to real world vehicle privacy protection: A new dataset and benchmark

引用

Pattern Recognition 2026年 169卷

作者： Jiayi Lin Chengming Zou Long Lan Yong Luo Yue Yu Yaowei Wang Wei Zeng Yonghong Tian School of Computer Science and Technology Wuhan University of Technology Wuhan 430070 Hubei China Hubei Key Laboratory of Transportation Internet of Things Technology Wuhan University of Technology Wuhan 430070 Hubei China Peng Cheng Laboratory Shenzhen 518000 Guangdong China State Key Laboratory of High Performance Computing National University of Defense Technology Changsha 410073 Hunan China School of Computer Science and Engineering Nanyang Technological University 639798 Singapore Key Laboratory of Parallel and Distributed Computing College of Computer National University of Defense Technology Changsha 410073 Hunan China School of Electronics Engineering and Computer Science Peking University Beijing 100871 China

The vehicle privacy protection plays a vital role in releasing or sharing of traffic videos. License plate, as the identifiable mark of a vehicle, contains the most sensitive information for a vehicle. Therefore, masking the license plates is a common way to protect the privacy of corresponding vehicles. However, in the real world scenarios, it is often hard to locate the small and shifting license plates, and therefore precise and cost-effective privacy protection is quite challenging. To address this problem in surveillance video, we fully explore all available spatio-temporal cues and design bidirectional Kalman filter model in the consecutive frames to locate missing license plates. To verify effectiveness of the proposed benchmark, we build a new License Plates Privacy-preserving Dataset (LPPD) collected from various scenes with diverse privacy and utility annotations. We demonstrate that our proposed method show very promising capability of privacy protection on the real world dataset without sacrificing its utility.

关键词：

来源：评论

学校读者我要写书评

暂无评论

VISUAL CONFUSION LABEL TREE FOR IMAGE CLASSIFICATION

arXiv

引用

arXiv 2019年

作者： Liu, Yuntao Dou, Yong Jin, Ruochun Li, Rongchun National University of Defense Technology National Laboratory for Parallel and Distributed Processing Changsha410073 China

Convolution neural network models are widely used in image classification tasks. However, the running time of such models is so long that it is not the conforming to the strict real-time requirement of mobile devices. In order to optimize models and meet the requirement mentioned above, we propose a method that replaces the fully-connected layers of convolution neural network models with a tree classifier. Specifically, we construct a Visual Confusion Label Tree based on the output of the convolution neural network models, and use a multi-kernel SVM plus classifier with hierarchical constraints to train the tree classifier. Focusing on those confusion subsets instead of the entire set of categories makes the tree classifier more discriminative and the replacement of the fully-connected layers reduces the original running time. Experiments show that our tree classifier obtains a significant improvement over the state-of-the-art tree classifier by 4.3% and 2.4% in terms of top-1 accuracy on CIFAR-100 and ImageNet datasets respectively. Additionally, our method achieves 124× and 115× speedup ratio compared with fully-connected layers on AlexNet and VGG16 without accuracy decline. Copyright © 2019, The Authors. All rights reserved.

关键词： Image classification

来源：评论

学校读者我要写书评

暂无评论

Approximate Iteration Detection and Precoding in Massive MIMO

引用

China Communications 2018年第5期15卷 183-196页

作者： Chuan Tang Yerong Tao Yancang Chen Cang Liu Luechao Yuan Zuocheng Xing Luoyang Electronic Equipment Test Center LuoYang 471000China National Laboratory for Parallel and Distributed Processing National University of Defense TechnologyChangsha 410073China

Massive multiple-input multiple-output provides improved energy efficiency and spectral efficiency in 5 G. However it requires large-scale matrix computation with tremendous complexity, especially for data detection and precoding. Recently, many detection and precoding methods were proposed using approximate iteration methods, which meet the demand of precision with low complexity. In this paper, we compare these approximate iteration methods in precision and complexity, and then improve these methods with iteration refinement at the cost of little complexity and no extra hardware resource. By derivation, our proposal is a combination of three approximate iteration methods in essence and provides remarkable precision improvement on desired vectors. The results show that our proposal provides 27%-83% normalized mean-squared error improvement of the detection symbol vector and precoding symbol vector. Moreover, we find the bit-error rate is mainly controlled by soft-input soft-output Viterbi decoding when using approximate iteration methods. Further, only considering the effect on soft-input soft-output Viterbi decoding, the simulation results show that using a rough estimation for the filter matrix of minimum mean square error detection to calculating log-likelihood ratio could provideenough good bit-error rate performance, especially when the ratio of base station antennas number and the users number is not too large.

关键词： massive MIMO detection and precoding matrix inversion iteration refinement soft Viterbi decoding

来源：评论

学校读者我要写书评

暂无评论

MIX: A Joint Learning Framework for Detecting Both Clustered and Scattered Outliers in Mixed-Type Data

MIX: A Joint Learning Framework for Detecting Both Clustered...

引用

IEEE International Conference on Data Mining (ICDM)

作者： Hongzuo Xu Yijie Wang Yongjun Wang Zhiyue Wu National University of Defense Technology Changsha China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China

Mixed-type data are pervasive in real life, but very limited outlier detection methods are available for these data. Some existing methods handle mixed-type data by feature converting, whereas their performance is downgraded by information loss and noise caused by the transformation. Another kind of approaches separately evaluates outlierness in numerical and categorical features. However, they fail to adequately consider the behaviours of data objects in different feature spaces, often leading to suboptimal results. As for outlier form, both clustered outliers and scattered outliers are contained in many real-world data, but a number of outlier detectors are inherently restricted by their outlier definitions to simultaneously detect both of them. To address these issues, an unsupervised outlier detection method MIX is proposed. MIX constructs a joint learning framework to establish a cooperation mechanism to make separate outlier scoring constantly communicate and sufficiently grasp the behaviours of data objects in another feature space. Specifically, MIX iteratively performs outlier scoring in numerical and categorical space. Each outlier scoring phase can be iteratively and cooperatively enhanced by the prior knowledge given by another feature space. To target both clustered and scattered outliers, the outlier scoring phases capture the essential characteristic of outliers, i.e., evaluating outlierness via the deviation from the normal model. We show that MIX significantly outperforms eight state-of-the-art outlier detectors on twelve real-world datasets and obtains good scalability.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Profiling HPC Applications with Low Overhead and High Accuracy

Profiling HPC Applications with Low Overhead and High Accura...

引用

IEEE International Conference on Big Data and Cloud Computing (BdCloud)

作者： Jingyuan Zhao Xin Liu Yao Liu Penglong Jiao Jinshuo Liu Wei Xue School of Data Science & Engineering East China Normal University Shanghai China School of Cyber Science and Engineering Wuhan University Wuhan China National Research Centre of Parallel Computer Engineering and Technology Beijing China Shanghai Key Laboratory of Multidimensional Information Processing East China Normal University Shanghai China Tsinghua University Beijing China

As the parallel scale of HPC applications represented by earth system models becomes larger and the computing cost becomes higher, the performance of HPC applications is increasingly critical. Profiling HPC applications accurately helps to model the applications and find the performance bottlenecks. However, due to the complexity of HPC applications, the diversity of programming languages, the differences of individual programming habits, and multiple architectures, accurate profiling becomes very tough. In this paper, we propose LPerf: a low-overhead and high-accuracy profiler for HPC applications. To reduce the profiling overhead and improve the profiling accuracy, we propose a preprocessing method which can automatically instrument with tunable granularity thus significantly reducing the run-time overhead of profiling, an aggregated caller-callee relationship which is used to locate relationship of functions efficiently, and a profiling-aware method which can precisely calculate running time of functions. The experimental results show that the error rate of profiling reaches 0.02%, and the overhead reaches 1.6%, in the earth system model named CAS-ESM. Compared with the baselines, the precision, accuracy, and overhead of LPerf have reached the state of the art.

关键词： Earth Analytical models Schedules Computer languages Costs Error analysis Computational modeling

来源：评论

学校读者我要写书评

暂无评论

Visual Tree Convolutional Neural Network in Image Classification

arXiv

引用

arXiv 2019年

作者： Liu, Yuntao Dou, Yong Jin, Ruochun Qiao, Peng National University of Defense Technology National Laboratory for Parallel and Distributed Processing Hunan Changsha410073 China

In image classification, Convolutional Neural Network(CNN) models have achieved high performance with the rapid development in deep learning. However, some categories in the image datasets are more difficult to distinguished than others. Improving the classification accuracy on these confused categories is benefit to the overall performance. In this paper, we build a Confusion Visual Tree(CVT) based on the confused semantic level information to identify the confused categories. With the information provided by the CVT, we can lead the CNN training procedure to pay more attention on these confused categories. Therefore, we propose Visual Tree Convolutional Neural Networks(VT-CNN) based on the original deep CNN embedded with our CVT. We evaluate our VT-CNN model on the benchmark datasets CIFAR-10 and CIFAR-100. In our experiments, we build up 3 different VT-CNN models and they obtain improvement over their based CNN models by 1.36%, 0.89% and 0.64%, respectively. Copyright © 2019, The Authors. All rights reserved.

关键词： Image classification

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：