检索结果-内蒙古大学图书馆

IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, PACRIM 2015

作者： Wang, Qinglin Liu, Jie Cui, Xiantao Fu, Guitao Gong, Chunye Xing, Zuocheng Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha410073 China School of Computer Science National University of Defense Technology Changsha410073 China Beijing Satellite Navigation Center Beijing100094 China

ISBN: (纸本)9781467377881

The coupling of microwaves into apertures plays an important part in many electromagnetic physics and engineering fields. When the width of apertures is very small, Finite Difference Time Domain (FDTD) simulation of the coupling is very time-consuming. As a many-core architecture, the Intel's Many Integrated Core (MIC) architecture owns 512-bit vector units and more than 200 threads. In this paper, we parallelize FDTD simulation of microwave pulse coupling into narrow slots on the Intel MIC architecture. In the implementation, the parallel programming model OpenMP is used to exploit thread parallelism while loop unrolling and SIMD intrinsic functions are utilized to accomplish vectorization. Compared with the serial version on Intel Xeon E5-2670 CPU, the implementation on the MIC coprocessor including 57 cores obtains a speedup of 11.57 times. The experiment results also demonstrate that the parallelization has good scalability in performance. Additionally, how binding relationship between OpenMP threads and hardware threads in MIC influences performance is also reported. © 2015 IEEE.

关键词： Scalability

来源：评论

学校读者我要写书评

暂无评论

DIPP——An LLC Replacement Policy for On-chip Dynamic Heterogeneous Multi-core Architecture

DIPP——An LLC Replacement Policy for On-chip Dynamic Hetero...

引用

International Conference of Young Computer Scientists, Engineers and Educators, ICYCSEE 2015

作者： Zhang Yang Xing Zuocheng Ma Xiao Science and technology on Parallel and distributed processing laboratory National University of Defense Technology

As the big data era is coming,it brings new challenges to the massive data processing.A combination of GPU and CPU on chip is the trend to release the pressure of large scale *** found that there are different memory access characteristics between GPU and *** most important one is that the programs of GPU include a large number of threads,which lead to higher access frequency in cache than the CPU *** the LRU policy favors the programs with high memory access frequency,the programs of GPU can't get the corresponding performance boost even more cache resources are *** LRU policy is not suitable for heterogeneous multi-core *** on the different characteristics of GPU and CPU programs on memory access,this paper proposes an LLC dynamic replacement policy--DIPP(Dynamic Insertion/Promotion Policy) for heterogeneous multi-core *** core idea of the replacement policy is to reduce the miss rate of the program and enhance the overall system performance by limiting the cache resources that GPU can acquire and reducing the thread interferences between *** compare the DIPP replacement policy with LRU and we conduct a classified discussion according to the program results of *** programs enhance 23.29% on the average performance(using arithmetic mean).Large working sets programs can improve 13.95%,compute-intensive programs enhance 9.66% and stream class programs improve 3.8%.

关键词： Big data Heterogeneous Multicore Replacement Policy DIPP

来源：评论

学校读者我要写书评

暂无评论

Accelerating Monte Carlo Simulation of Neutron Transport on the Intel MIC Architecture

Accelerating Monte Carlo Simulation of Neutron Transport on ...

引用

International Conference on Information science and Control Engineering (ICISCE)

作者： Xiantao Cui Jie Liu Lihua Chi Qinglin Wang Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China

ISBN: (纸本)9781467368513

Simulation of particle transport is critical for a great many of scientific and engineering domains. The Monte Carlo (MC) method is one of the most important numerical methods for the simulation of particle transport, and can simulate many complex types of particle transport. But the computation requirement of the MC simulation is very large. In 2010, Intel announced the Intel Many Integrated Core (MIC) architecture, which consists of many simple general-purpose cores and supports the well-known shared-memory execution model that is the base of most nodes in HPC machines. On account of the independence of simulation of each particle in the MC method, it is well-suited to accelerate the MC simulation on MIC. In this paper, an algorithm named MCNP-MIC based MIC is presented for MC simulation of neutron transport in the context of deep penetration problem, which includes the development of parallel random generator, the assignment of particle number based thread number and the design of high efficiency data structures for parallelism. Eventually, we get the results as follows: with the same problem scale and computational accuracy, the MCNPMIC algorithm has achieved roughly 5.6-fold speedup running on a 57-core MIC chip in comparison with the serial MCNP algorithm on an Intel Xeon E5-2670 CPU.

关键词： Microwave integrated circuits Neutrons Algorithm design and analysis Monte Carlo methods Data structures Graphics processing units

来源：评论

学校读者我要写书评

暂无评论

RaceChecker: Efficient Identification of Harmful Data Races

RaceChecker: Efficient Identification of Harmful Data Races

引用

Euromicro Conference on parallel, distributed and Network-Based Processing

作者： Kai Lu Zhendong Wu Xiaoping Wang Chen Chen Xu Zhou Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha PR China

ISBN: (纸本)9781479984923

Data races hidden in concurrent programs have caused severe failures. To improve the reliability, many race detectors are proposed. However, most of the reported races are not harmful, which consumes manual effort to identify the harmful races. This paper proposes RaceChecker that can detect the potential races and identify the harmful races effectively and efficiently. Unlike previous detectors, RaceChecker combines happens-before relation and ad-hoc synchronization to prune the infeasible races so that fewer potential races are required to be verified. Before verification, RaceChecker groups the remaining potential races, guaranteeing the potential races in one group do not interfere with each other. Therefore, multiple potential races in one group can be verified together in one execution. To our knowledge, this is the first effective technique that groups the potential races to improve the efficiency. Unlike previous detectors that verify one potential race in one execution, RaceChecker dynamically controls thread scheduler to create real race conditions to verify multiple potential races in one execution, identifying the harmful races that cause program failures. We have implemented RaceChecker as a prototype tool and have experimented on a number of real-world concurrent programs. Results show that 66% of the potential races are infeasible and nearly 48% of the executions are reduced by the grouping strategy. The known harmful races are also identified effectively. By pruning and grouping, RaceChecker identifies the harmful races more efficiently. Comparing with RaceMob and RaceFuzzer, the time is reduced significantly, with an average of 45% and 81% respectively.

关键词： Detectors Synchronization Concurrent computing Programming Relays Instruction sets Monitoring

来源：评论

学校读者我要写书评

暂无评论

Partial Clones for Stragglers in MapReduce

Partial Clones for Stragglers in MapReduce

引用

International Conference of Young Computer Scientists, Engineers and Educators, ICYCSEE 2015

作者： Jia Li Changjian Wang Dongsheng Li Zhen Huang National Laboratory for Parallel and Distributed Processing School of Computer ScienceNational University of Defense Technology

Stragglers can temporize jobs and reduce cluster efficiency *** researches have been contributed to the solution,such as Blacklist[8],speculative execution[1,6],Dolly[8].In this paper,we put forward a new approach for mitigating stragglers in Map Reduce,name *** starts task clones only for high-risk delaying *** experiments have been carried and results show that it can decrease the job delaying risk with fewer resources *** small jobs,Hummer also improves job completion time by 48% and 10% compared to LATE and Dolly.

关键词： MapReduce mitigating stragglers task clones

来源：评论

学校读者我要写书评

暂无评论

Running mechanism and implementation technique of self-adaptive software in open environment

引用

Jisuanji Xuebao/Chinese Journal of Computers 2015年第9期38卷 1893-1906页

作者： Mao, Xin-Jun Dong, Meng-Gao Qi, Zhi-Chang Yin, Jun-Wen Department of Computer Science and Technology College of Computer National University of Defense Technology Changsha410073 China Science and Technology on Parallel and Distributed Processing Laboratory College of Computer National University of Defense Technology Changsha410073 China Laboratory of Science and Technology on Integrated Logistic Support National University of Defense Technology Changsha410073 China

Due to the uncertainty and unpredictability of environment changes, it is a great challenge to develop self-adaptive systems in open environment. First, it is difficult for developers to clearly predict various environment changes and precisely define self-adaptation requirements at design-time. Second, many of self-adaptation decisions should be made by system at run-time. In order to deal with the problems, the paper presents an approach that is based on software agent technology and organization metaphor to support the development and running of such systems. Our approach enables developer to describe self-adaptive systems and investigate self-adaptation according to the high-level organization abstractions. A self-adaptation mechanism called role dynamic binding is designed and on-line self-adaptation is achieved by introducing enforcement learning. The paper details the on-line self-adaptation decision algorithm that integrates dynamic binding mechanism with enforcement learning together. Especially, a general-purpose and systematics software engineering solution to developing such system is provided, including self-adaptive software model, implementation framework, structured process and supporting software environment SADE+. A case is studied to illustrate our approach and validate its effectiveness. ©, 2015, Jisuanji Xuebao/Chinese Journal of Computers. All right reserved.

关键词： Dynamics

来源：评论

学校读者我要写书评

暂无评论

Efficient distributed Data Clustering on Spark

Efficient Distributed Data Clustering on Spark

引用

IEEE International Conference on Cluster Computing

作者： Jia Li Dongsheng Li Yiming Zhang National Laboratory for Parallel and Distributed Processing School of Computer Science National University of Defense Technology Changsha China

ISBN: (纸本)9781467365994

Data clustering is usually time-consuming since it by default needs to iteratively aggregate and process large volume of data. Approximate aggregation based on sample provides fast and quality ensured results. In this paper, we propose to leverage approximation techniques to data clustering to obtain the trade-off between clustering efficiency and result quality, along with online accuracy estimation. The proposed method is based on the bootstrap trials. We implemented this method as an Intelligent Bootstrap Library (IBL) on Spark to support efficient data clustering. Intensive evaluations show that IBL can provide a 2x speed-up over the state of art solution with the same error bound.

关键词： Sparks Accuracy Data mining Estimation error distributed databases Approximation methods

来源：评论

学校读者我要写书评

暂无评论

Experimental verification of the parasitic bipolar amplification effect in PMOS single event transients

引用

Chinese Physics B 2014年第7期23卷 775-779页

作者：何益百陈书明 College of Computer National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology

The contribution of parasitic bipolar amplification to SETs is experimentally verified using two P-hit target chains in the normal layout and in the special layout. For PMOSs in the normal layout, the single-event charge collection is composed of diffusion, drift, and the parasitic bipolar effect, while for PMOSs in the special layout, the parasitic bipolar junction transistor cannot turn on. Heavy ion experimental results show that PMOSs without parasitic bipolar amplification have a 21.4% decrease in the average SET pulse width and roughly a 40.2% reduction in the SET cross-section.

关键词： single event effect single event transient parasitic bipolar amplification heavy ion experiments

来源：评论

学校读者我要写书评

暂无评论

Two-dimensional euler PCA for face recognition 21

Two-dimensional euler PCA for face recognition

引用

21st International Conference on MultiMedia Modeling, MMM 2015

作者： Tan, Huibin Zhang, Xiang Guan, Naiyang Tao, Dacheng Huang, Xuhui Luo, Zhigang Science and Technology on Parallel Distributed Processing Laboratory College of Computer National University of Defense Technology Changsha Hunan410073 China Department of Computer Science and Technology College of Computer National University of Defense Technology Changsha Hunan410073 China Centre for Quantum Computation and Intelligent Systems and the Faculty of Engineering and Information Technology University of Technology Sydney 235 Jones Street UltimoNSW2007 Australia

ISBN: (纸本)9783319144412

Principal component analysis (PCA) projects data on the directions with maximal variances. Since PCA is quite effective in dimension reduction, it has been widely used in computer vision. However, conventional PCA suffers from following deficiencies: 1) it spends much computational costs to handle high-dimensional data, and 2) it cannot reveal the nonlinear relationship among different features of data. To overcome these deficiencies, this paper proposes an efficient two-dimensional Euler PCA (2D-ePCA) algorithm. Particularly, 2D-ePCA learns projection matrix on the 2D pixel matrix of each image without reshaping it into 1D long vector, and uncovers nonlinear relationships among features by mapping data onto complex representation. Since such 2D complex representation induces much smaller kernel matrix and principal subspaces, 2D-ePCA costs much less computational overheads than Euler PCA on large-scale dataset. Experimental results on popular face datasets show that 2D-ePCA outperforms the representative algorithms in terms of accuracy, computational overhead, and robustness. © Springer International Publishing Switzerland 2015.

关键词： Principal component analysis

来源：评论

学校读者我要写书评

暂无评论

Comparison of heavy-ion induced SEU for D- and TMR-flip-flop designs in 65-nm bulk CMOS technology

引用

science China(Information sciences) 2014年第10期57卷 223-229页

作者： HE YiBai CHEN ShuMing School of Computer Science National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology

Heavy ion experiments were performed on D flip-flop(DFF) and TMR flip-flop(TMRFF) fabricated in a 65-nm bulk CMOS process. The experiment results show that TMRFF has about 92% decrease in SEU crosssection compared to the standard DFF design in static test mode. In dynamic test mode, TMRFF shows much stronger frequency dependency than the DFF design, which reduces its advantage over DFF at higher operation frequency. At 160 MHz, the TMRFF is only 3.2× harder than the standard DFF. Such small improvement in the SEU performance of the TMR design may warrant reconsideration for its use in hardening design.

关键词： SEU flip-flop TMR heavy-ion frequency

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：