检索结果-内蒙古大学图书馆

2015 IEEE 16th International Conference on Communication technology(ICCT 2015)

作者： Xiang Zhang Naiyang Guan Zhigang Luo College of Computer National University of Defense Technology National Laboratory for Parallel and Distributed Processing National University of Defense Technology

Sparse coding has shown its great potential in learning image feature representation. Recent developed methods such as group sparse coding prefer discovering the group relationships among examples and have achieved the state-of-the-art results in image classification. However, they suffer from poor robustness shortcomings in practice. This paper proposes a robust weighted supervised sparse coding method(RWSSC) to address this ***, RWSSC distinguishes different classes' contributions to the sparse coding by a novel weighting strategy meanwhile removes the out liers by imposing1 l-regularization over the noisy entries. Benefitting from these strategies, RWSSC can effectively boost performance of sparse coding in image ***, we developed the block coordinate descent algorithm to optimize it, and proved its *** results of image classification on two popular datasets show that RWSSC outperforms the representative sparse coding methods in quantities.

关键词： Sparse coding Supervised learning Image classification

来源：评论

学校读者我要写书评

暂无评论

A low-latency fine-grained dynamic shared cache management scheme for chip multi-processor

A low-latency fine-grained dynamic shared cache management s...

引用

IEEE International Conference on Performance, Computing and Communications (IPCCC)

作者： Jinbo Xu Weixia Xu Zhengbin Pang College of Computer National University of Defense Technology Changsha China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China

In order to utilize the shared last-level cache (LLC) in chip multi-processors (CMP) more efficiently, the partitioning of LLC resources among all cores should have the characteristics of low-latency for access, fine granularity for migration and simple hardware complexity for implementation. This paper proposes a dynamic LLC management scheme to achieve these goals. The proposed scheme migrates cache resources among different cores at the granularity of cache blocks, instead of ways. The quantity of victim cache blocks that each victim core can migrate to other target cores are related to an eviction probability, which are calculated according to the performance goal. Then the victim cache blocks for a target core is chosen from the nearest victim core who has non-zero eviction probability by introducing innovate E-Table structure in CMP. The eviction probabilities are updated periodically. With the help of E-Tables, the proposal achieves low-latency accesses by always keeping the required cache blocks near to the target cores. And fine granularity is guaranteed by maintaining an eviction probability for each core. In addition, only little additional hardware changes to traditional cache structure is required. Simulation results suggest significant performance improvements from 6.8% to 22.7% over related works.

关键词： Hardware Resource management Probability distribution Proposals Complexity theory Simulation

来源：评论

学校读者我要写书评

暂无评论

Poster: Segmentation Based Online Performance Problem Diagnosis

Poster: Segmentation Based Online Performance Problem Diagno...

引用

International Conference on Software Engineering (ICSE)

作者： Jingwen Zhou Zhenbang Chen Ji Wang College of Computer National University of Defense Technology Changsha China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China

ISBN: (纸本)9781479919352

Currently, the performance problems of software systems gets more and more attentions. Among various diagnosis methods based on system traces, principal component analysis (PCA) based methods are widely used due to the high accuracy of the diagnosis results and requiring no specific domain knowledge. However, according to our experiments, we have validated several shortcomings existed in PCA-based methods, including requiring traces with a same call sequence, inefficiency when the traces are long, and missing performance problems. To cope with these issues, we introduce a segmentation based online diagnosis method in this poster.

关键词： Principal component analysis Software systems Accuracy Measurement Monitoring computer architecture Conferences

来源：评论

学校读者我要写书评

暂无评论

MPISE: Symbolic Execution of MPI Programs

MPISE: Symbolic Execution of MPI Programs

引用

IEEE International Symposim on High Assurance Systems Engineering

作者： Xianjin Fu Zhenbang Chen Yufeng Zhang Chun Huang Wei Dong Ji Wang College of Computer National University of Defense Technology Changsha China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China

ISBN: (纸本)9781479981120

Message Passing Interfaces (MPI) plays an important role in parallel computing. Many parallel applications are implemented as MPI programs. The existing methods of bug detection for MPI programs have the shortage of providing both input and non-determinism coverage, leading to missed bugs. In this paper, we employ symbolic execution to ensure the input coverage, and propose an on-the-fly schedule algorithm to reduce the interleaving explorations for non-determinism coverage, while ensuring the soundness and completeness. We have implemented our approach as a tool, called MPISE, which can automatically detect the deadlock and runtime bugs in MPI programs. The results of the experiments on benchmark programs and real world MPI programs indicate that MPISE finds bugs effectively and efficiently. In addition, our tool also provides diagnostic information and replay mechanism to help understand bugs.

关键词： computer bugs System recovery Switches Synchronization Runtime Testing Scheduling

来源：评论

学校读者我要写书评

暂无评论

Efficient distributed Data Clustering on Spark

Efficient Distributed Data Clustering on Spark

引用

IEEE International Conference on Cluster Computing

作者： Jia Li Dongsheng Li Yiming Zhang National Laboratory for Parallel and Distributed Processing School of Computer Science National University of Defense Technology Changsha China

ISBN: (纸本)9781467365994

Data clustering is usually time-consuming since it by default needs to iteratively aggregate and process large volume of data. Approximate aggregation based on sample provides fast and quality ensured results. In this paper, we propose to leverage approximation techniques to data clustering to obtain the trade-off between clustering efficiency and result quality, along with online accuracy estimation. The proposed method is based on the bootstrap trials. We implemented this method as an Intelligent Bootstrap Library (IBL) on Spark to support efficient data clustering. Intensive evaluations show that IBL can provide a 2x speed-up over the state of art solution with the same error bound.

关键词： Sparks Accuracy Data mining Estimation error distributed databases Approximation methods

来源：评论

学校读者我要写书评

暂无评论

A Novel Run-time Load Balancing Method for MapReduce

A Novel Run-time Load Balancing Method for MapReduce

引用

International Conference on computer science and Network technology

作者： Zhihong Liu Yaping Liu Baosheng Wang Zhenghu Gong College of Computer National University of Defense Technology Changsha Hunan China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha Hunan China

ISBN: (纸本)9781467381741

In recent years, many companies are embracing the Hadoop MapReduce system for large-data processing with completion time constrains. However, exiting Hadoop schedulers still suffer from the reducer load imbalancing problem. In this paper, we present a novel run-time load balancing method for MapReduce. Our approach predicts the workload of each reduce task at run-time, and assigns the reduce tasks to specified machines based on the estimated workload of reduce tasks dynamically. Therefore, our approach can achieve load balance among machines. The experimental results show that our approach achieves high accuracy while predicting the workload of reduce tasks, and improves the job completion time by up to 23.15%.

关键词： MapReduce load balancing task scheduling Load balancing TASK SCHEDULING Completion time Runtime Workload Religious Missions

来源：评论

学校读者我要写书评

暂无评论

MilkyWay-2 supercomputer： system and application

引用

Frontiers of computer science 2014年第3期8卷 345-356页

作者： Xiangke LIAO Liquan XIAO Canqun YANG Yutong LU Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China College of Computer National University of Defense Technology Changsha 410073 China

On June 17, 2013, MilkyWay-2 （Tianhe-2） supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design of hardware and software systems. The key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity- off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications, proprietary 16- core processor designed for scientific computing, efficient software stacks that provide high performance file system, emerging programming model for heterogeneous systems, and intelligent system administration. We perform extensive evaluation with wide-ranging applications from LINPACK and Graph500 benchmarks to massively parallel software deployed in the system.

关键词： MilkyWay-2 supercomputer petaflops computing neo-heterogeneous architecture interconnect network heterogeneous programing model system management benchmark optimization performance evaluation

来源：评论

学校读者我要写书评

暂无评论

Comparison of heavy-ion induced SEU for D- and TMR-flip-flop designs in 65-nm bulk CMOS technology

引用

science China(Information sciences) 2014年第10期57卷 223-229页

作者： HE YiBai CHEN ShuMing School of Computer Science National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology

Heavy ion experiments were performed on D flip-flop(DFF) and TMR flip-flop(TMRFF) fabricated in a 65-nm bulk CMOS process. The experiment results show that TMRFF has about 92% decrease in SEU crosssection compared to the standard DFF design in static test mode. In dynamic test mode, TMRFF shows much stronger frequency dependency than the DFF design, which reduces its advantage over DFF at higher operation frequency. At 160 MHz, the TMRFF is only 3.2× harder than the standard DFF. Such small improvement in the SEU performance of the TMR design may warrant reconsideration for its use in hardening design.

关键词： SEU flip-flop TMR heavy-ion frequency

来源：评论

学校读者我要写书评

暂无评论

The TH Express high performance interconnect networks

引用

Frontiers of computer science 2014年第3期8卷 357-366页

作者： Zhengbin PANG Min XIE Jun ZHANG Yi ZHENG Guibin WANG Dezun DONG Guang SUO Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China College of Computer National University of Defense Technology Changsha 410073 China

Interconnection network plays an important role in scalable high performance computer （HPC） systems. The TH Express-2 interconnect has been used in MilkyWay-2 system to provide high-bandwidth and low-latency interprocessot communications, and continuous efforts are devoted to the development of our proprietary interconnect. This paper describes the state-of-the-art of our proprietary interconnect, especially emphasizing on the design of network interface. Several key features are introduced, such as user-level communication, remote direct memory access, offload collective operation, and hardware reliable end-to-end communication, etc. The design of a low level message passing infrastructures and an upper message passing services are also proposed. The preliminary performance results demonstrate the efficiency of the TH interconnect interface.

关键词： HPC network interface chip （NIC） TH Express nterconnect offload collective operation

来源：评论

学校读者我要写书评

暂无评论

Constraint-Relaxation Approach for Nonnegative Matrix Factorization: A Case Study

Constraint-Relaxation Approach for Nonnegative Matrix Factor...

引用

IEEE International Conference on Systems, Man and Cybernetics

作者： Jue Wang Naiyang Guan Xuhui Huang Zhigang Luo Science and Technology on Parallel and Distributed Processing Laboratory College of Computer National University of Defense Technology Hunan P.R. China Department of Computer Science and Techonoly College of Computer National University of Defense Technology Hunan P.R. China

ISBN: (纸本)9781479986989

Nonnegative matrix factorization (NMF) is a powerful technique for dimensionality reduction. Conventional NMF algorithms usually keep the matrices W and H nonnegative while iterating. However, to get the NMF of a matrix, it's unnecessary to force the temporary solutions in iterations nonnegative. In this paper, we propose a two-staged approach for NMF. At the relaxation stage, the nonnegative constraint of temporary solutions is relaxed and a real valued matrix factorization is generated. At the constraint stage, the real valued matrix factorization is transformed to a nonnegative matrix factorization by an invertible linear transformation. Based on this approach, we study on exact nonnegative matrix factorization when rank=2. We proved that, given two real valued matrices of rank=2, there exists an invertible linear transformation which can transform the real valued matrices to nonnegative matrices with their product stable. We propose an algorithm to find out the transformation. When rank is higher than 2, this kind of transformation may not exist. In the experiments, it's showed that this approach can reach a nonnegative matrix factorization with lower reconstruction error than conventional methods, and the technique for rank=2 exact NMF works well.

关键词： Algorithm design and analysis Matrix decomposition Additives computers Transforms Matrix converters Clustering algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：