检索结果-内蒙古大学图书馆

International Conference on Electronic Engineering and Information science, ICEEIS 2015

作者： Shi, Y. Lu, J. Zheng, J.J. College of Computer Science and Technology Heilongjiang University China College of Computer Science and Technology Heilongjiang University Key Laboratory of Database and Parallel Computing of Heilongjiang Province Harbin China

ISBN: (纸本)9781138027725

The promoters of Arabidopsis and the Poplar are parallel processed and analyzed in this paper. The mainly works include three parts: First, The Motif data are parallel matched with both plant promoter data based on OpenMP. Then, the matching result is processed by frequent mining technology. The experimental result shows that the parallel processing time based on OpenMP is far less than the serial time. Second, to increase the calculation accuracy, the algorithm of solving the P value is improved by the prime split method. At last, according to the frequent process result, the shared frequent items of the two plants are analyzed. © 2015 Taylor & Francis Group, London.

关键词： Application programming interfaces (API)

来源：评论

学校读者我要写书评

暂无评论

Unified virtual memory support for deep CNN accelerator on soC FPGA 1

引用

15th International Conference on Algorithms and Architectures for parallel Processing, ICA3PP 2015

作者： Xiao, Tao Qiao, Yuran Shen, Junzhong Yang, Qianming Wen, Mei College of Computer National University of Defense Technology Changsha410073 China National Key Laboratory of Parallel and Distributed Processing National University of Defense Technology Changsha410073 China

ISBN: (数字)9783319271194

ISBN: (纸本)9783319271187

Cooperation of CPU and hardware accelerator on SoC FPGA to accomplish computational intensive tasks, provides significant advantages in performance and energy efficiency. However, current operating systems provide little support for accelerators: the OS is unaware that a computational task can be executed either on a CPU core or an accelerator, and provides no assistance in efficient management of data sharing between CPU and accelerator on the DRAM, such as zero copy, data coherence. It’s also hard for current OS to allocate large contiguous physical memory space for accelerator. In this paper, we select the Xilinx ZYNQ as target and qualitatively analyze methods of sharing data. Besides using high-performance (HP) AXI interfaces of the ZYQN device, we develop a novel memory management system for FPGA-based accelerator. It provides a unified virtual space for CPU cores and accelerator so that they can access the same memory space in the operating systems user space. For a deep convolutional neural network task, our design gains up to speed-up of 5. 34x compared to traditional processoraccelerator cooperation. © Springer International Publishing Switzerland 2015.

关键词： Energy efficiency

来源：评论

学校读者我要写书评

暂无评论

Running mechanism and implementation technique of self-adaptive software in open environment

引用

Jisuanji Xuebao/Chinese Journal of computers 2015年第9期38卷 1893-1906页

作者： Mao, Xin-Jun Dong, Meng-Gao Qi, Zhi-Chang Yin, Jun-Wen Department of Computer Science and Technology College of Computer National University of Defense Technology Changsha410073 China Science and Technology on Parallel and Distributed Processing Laboratory College of Computer National University of Defense Technology Changsha410073 China Laboratory of Science and Technology on Integrated Logistic Support National University of Defense Technology Changsha410073 China

Due to the uncertainty and unpredictability of environment changes, it is a great challenge to develop self-adaptive systems in open environment. First, it is difficult for developers to clearly predict various environment changes and precisely define self-adaptation requirements at design-time. Second, many of self-adaptation decisions should be made by system at run-time. In order to deal with the problems, the paper presents an approach that is based on software agent technology and organization metaphor to support the development and running of such systems. Our approach enables developer to describe self-adaptive systems and investigate self-adaptation according to the high-level organization abstractions. A self-adaptation mechanism called role dynamic binding is designed and on-line self-adaptation is achieved by introducing enforcement learning. The paper details the on-line self-adaptation decision algorithm that integrates dynamic binding mechanism with enforcement learning together. Especially, a general-purpose and systematics software engineering solution to developing such system is provided, including self-adaptive software model, implementation framework, structured process and supporting software environment SADE+. A case is studied to illustrate our approach and validate its effectiveness. ©, 2015, Jisuanji Xuebao/Chinese Journal of computers. All right reserved.

关键词： Dynamics

来源：评论

学校读者我要写书评

暂无评论

Two-dimensional euler PCA for face recognition 21

Two-dimensional euler PCA for face recognition

引用

21st International Conference on MultiMedia Modeling, MMM 2015

作者： Tan, Huibin Zhang, Xiang Guan, Naiyang Tao, Dacheng Huang, Xuhui Luo, Zhigang Science and Technology on Parallel Distributed Processing Laboratory College of Computer National University of Defense Technology Changsha Hunan410073 China Department of Computer Science and Technology College of Computer National University of Defense Technology Changsha Hunan410073 China Centre for Quantum Computation and Intelligent Systems and the Faculty of Engineering and Information Technology University of Technology Sydney 235 Jones Street UltimoNSW2007 Australia

ISBN: (纸本)9783319144412

Principal component analysis (PCA) projects data on the directions with maximal variances. Since PCA is quite effective in dimension reduction, it has been widely used in computer vision. However, conventional PCA suffers from following deficiencies: 1) it spends much computational costs to handle high-dimensional data, and 2) it cannot reveal the nonlinear relationship among different features of data. To overcome these deficiencies, this paper proposes an efficient two-dimensional Euler PCA (2D-ePCA) algorithm. Particularly, 2D-ePCA learns projection matrix on the 2D pixel matrix of each image without reshaping it into 1D long vector, and uncovers nonlinear relationships among features by mapping data onto complex representation. Since such 2D complex representation induces much smaller kernel matrix and principal subspaces, 2D-ePCA costs much less computational overheads than Euler PCA on large-scale dataset. Experimental results on popular face datasets show that 2D-ePCA outperforms the representative algorithms in terms of accuracy, computational overhead, and robustness. © Springer International Publishing Switzerland 2015.

关键词： Principal component analysis

来源：评论

学校读者我要写书评

暂无评论

Non-negative low-rank and group-sparse matrix factorization 21

Non-negative low-rank and group-sparse matrix factorization

引用

21st International Conference on MultiMedia Modeling, MMM 2015

作者： Wu, Shuyi Zhang, Xiang Guan, Naiyang Tao, Dacheng Huang, Xuhui Luo, Zhigang Science and Technology on Parallel Distributed Processing Laboratory College of Computer National University of Defense Technology Changsha Hunan410073 China Department of Computer Science and Technology College of Computer National University of Defense Technology Changsha Hunan410073 China Centre for Quantum Computation & Intelligent Systems and the Faculty of Engineering and Information Technology University of Technology Sydney 235 Jones Street UltimoNSW2007 Australia

ISBN: (纸本)9783319144412

Non-negative matrix factorization (NMF) has been a popular data analysis tool and has been widely applied in computer vision. However, conventional NMF methods cannot adaptively learn grouping structure froma *** paper proposes a non-negative low-rank and group-sparse matrix factorization (NLRGS) method to overcome this deficiency. Particularly, NLRGS captures the relationships among examples by constraining rank of the coefficients meanwhile identifies the grouping structure via group sparsity regularization. By both constraints, NLRGS boosts NMF in both classification and clustering. However, NLRGS is difficult to be optimized because it needs to deal with the low-rank constraint. To relax such hard constraint, we approximate the low-rank constraint with the nuclear norm and then develop an optimization algorithm for NLRGS in the frame of augmented Lagrangian method(ALM). Experimental results of both face recognition and clustering on four popular face datasets demonstrate the effectiveness of NLRGS in quantities. © Springer International Publishing Switzerland 2015.

关键词： Non-negative matrix factorization

来源：评论

学校读者我要写书评

暂无评论

Accelerating FDTD simulation of microwave pulse coupling into narrow slots on the Intel MIC architecture

Accelerating FDTD simulation of microwave pulse coupling int...

引用

IEEE Pacific Rim Conference on Communications, computers and Signal Processing, PACRIM 2015

作者： Wang, Qinglin Liu, Jie Cui, Xiantao Fu, Guitao Gong, Chunye Xing, Zuocheng Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha410073 China School of Computer Science National University of Defense Technology Changsha410073 China Beijing Satellite Navigation Center Beijing100094 China

ISBN: (纸本)9781467377881

The coupling of microwaves into apertures plays an important part in many electromagnetic physics and engineering fields. When the width of apertures is very small, Finite Difference Time Domain (FDTD) simulation of the coupling is very time-consuming. As a many-core architecture, the Intel's Many Integrated Core (MIC) architecture owns 512-bit vector units and more than 200 threads. In this paper, we parallelize FDTD simulation of microwave pulse coupling into narrow slots on the Intel MIC architecture. In the implementation, the parallel programming model OpenMP is used to exploit thread parallelism while loop unrolling and SIMD intrinsic functions are utilized to accomplish vectorization. Compared with the serial version on Intel Xeon E5-2670 CPU, the implementation on the MIC coprocessor including 57 cores obtains a speedup of 11.57 times. The experiment results also demonstrate that the parallelization has good scalability in performance. Additionally, how binding relationship between OpenMP threads and hardware threads in MIC influences performance is also reported. © 2015 IEEE.

关键词： Scalability

来源：评论

学校读者我要写书评

暂无评论

Cooperative monitoring BGP among autonomous systems

引用

Security and Communication Networks 2015年第10期8卷 1943-1957页

作者： Hu, Ning Wang, BaoSheng Liu, Xin National Key Laboratory for Parallel and Distributed Processing College of Computer National University of Defense Technology Changsha Hunan China

As the de facto Internet inter-domain routing protocol, BGP protocol has a number of vulnerabilities and weakness. Monitoring BGP is an effective way to improve the security of inter-domain routing. This paper presents a cooperative BGP monitoring method, which is called cooperative information sharing model (CoISM). CoISM is based on self-organization and can be used to solve some problems during BGP monitoring, such as route validation and bogus route notification delivery. CoISM provides autonomous systems with a more comprehensive information view by introducing information diffuse reflection based on initiative inquiry and making use of the relativity of monitoring information. CoISM optimizes the information transmission by leveraging the data locality caused by BGP policy and implements ISP coordination with low communication and deployment cost. More specifically, CoISM provides a self-organizing and incentive mechanism, which drives autonomous systems to coordinate independently and shares information on-demand. CoISM supports incremental deployment and can also be applied to a wide range of inter-domain cooperative management applications such as inter-domain routing failure analysis and intrusion detection. © 2014 John Wiley & Sons, Ltd.

关键词： Monitoring

来源：评论

学校读者我要写书评

暂无评论

Experimental verification of the parasitic bipolar amplification effect in PMOS single event transients

引用

Chinese Physics B 2014年第7期23卷 775-779页

作者：何益百陈书明 College of Computer National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology

The contribution of parasitic bipolar amplification to SETs is experimentally verified using two P-hit target chains in the normal layout and in the special layout. For PMOSs in the normal layout, the single-event charge collection is composed of diffusion, drift, and the parasitic bipolar effect, while for PMOSs in the special layout, the parasitic bipolar junction transistor cannot turn on. Heavy ion experimental results show that PMOSs without parasitic bipolar amplification have a 21.4% decrease in the average SET pulse width and roughly a 40.2% reduction in the SET cross-section.

关键词： single event effect single event transient parasitic bipolar amplification heavy ion experiments

来源：评论

学校读者我要写书评

暂无评论

Partial Clones for Stragglers in MapReduce

Partial Clones for Stragglers in MapReduce

引用

International Conference of Young computer Scientists, Engineers and Educators, ICYCSEE 2015

作者： Jia Li Changjian Wang Dongsheng Li Zhen Huang National Laboratory for Parallel and Distributed Processing School of Computer ScienceNational University of Defense Technology

Stragglers can temporize jobs and reduce cluster efficiency *** researches have been contributed to the solution,such as Blacklist[8],speculative execution[1,6],Dolly[8].In this paper,we put forward a new approach for mitigating stragglers in Map Reduce,name *** starts task clones only for high-risk delaying *** experiments have been carried and results show that it can decrease the job delaying risk with fewer resources *** small jobs,Hummer also improves job completion time by 48% and 10% compared to LATE and Dolly.

关键词： MapReduce mitigating stragglers task clones

来源：评论

学校读者我要写书评

暂无评论

Implementation of an Accurate and Efficient Compensated DGEMM for 64-bit ARMv8 Multi-Core Processors

Implementation of an Accurate and Efficient Compensated DGEM...

引用

International Conference on parallel and distributed Systems (ICPADS)

作者： Hao Jiang Feng Wang Kuan Li Canqun Yang Kejia Zhao Chun Huang College of Computer Science National University of Defense Technology Changsha China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China

ISBN: (纸本)9781467386692

This paper presents an implementation of an accurate and efficient compensated Double-precision General Matrix Multiplication (DGEMM) based on OpenBLAS for 64-bit ARMv8 multi-core processors. Due to cancellation phenomena in floating point arithmetic, the results of DGEMM may not be as accurate as expected. In order to increase the accuracy of DGEMM, we compensate the error introduced by its dot product kernel (GEBP) by applying an error-free transformation to rewrite the kernel in assembly language. We optimize the computations in the inner kernel through exploiting loop unrolling, instruction scheduling and software-implemented register rotation to exploit instruction level parallelism (ILP). We also conduct a priori error analysis of the derived CompDGEMM. Our compensated DGEMM is as accurate as the existing quadruple precision GEMM using MBLAS, but is up to 6.4x faster. Our parallel implementation achieves good performance and scalability under varying thread counts across a range of matrix sizes evaluated.

关键词： Multicore processing Kernel Registers Libraries Error analysis Algorithm design and analysis

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：