检索结果-内蒙古大学图书馆

2016 2nd International Conference on Artificial Intelligence and Industrial Engineering (AIIE2016)

作者： Zelong Wang Qiang Lan Dafei Huang Mei Wen Department of Compute National University of Technology Defense National Key Laboratory of Parallel and Distributed Processing National University of Defense Technology

ISBN: (纸本)9781510835368

Convolution operation is the most important and time consuming step in a convolution neural network *** this work,we analyze the computing complexity of direct convolution and fast-Fourier-transform-based(FFT-based) *** creatively propose CS-unit,which is equivalent to a combination of a convolutional layer and a pooling layer but more *** computing complexity of and some other similar operation is demonstrated,revealing an advantage on computation of ***,practical experiments are also performed and the result shows that CS-unit holds a real superiority on run time.

关键词： computing complexity FFT-based convolution CS-unit

来源：评论

学校读者我要写书评

暂无评论

Benchmarking the Powering Computations for Application Tuning

Benchmarking the Powering Computations for Application Tunin...

引用

International Conference on Software Analysis, Testing and Evolution (SATE)

作者： Yongang Che Chuanfu Xu Zhenghua Wang Computer College National University of Defense Technology Changsha China Science and Technology on Parallel and Distributed Processing Laboratory (PDL) National University of Defense Technology Changsha China

ISBN: (纸本)9781509045181

Powering is an important operation in many computation intensive workloads. This paper investigates the performance of different styles to calculate the powering operations from the application level. A series of small benchmark codes that calculate the powering operations in different ways are designed. Their performance is evaluated on Intel Xeon CPU under Intel compilation environments. The results show that the number of floating-point operations and the related runtime are sensitive to the value of the exponent Y and how it is used. When Y is an immediate integer number whose value is known at compile time, the cost of powering is much less than the situation when Y is an integer variable whose value is known at runtime. When Y is defined as a real variable, the cost of powering is always high, be it equals to an integer number or not. Based on the investigations, performance optimizations are applied to a kernel subroutine from a real-world supersonic combustion simulation code, which intensively involves powering operations. The result shows that the performance of that subroutine is improved for 13.25 times on the Intel Xeon E5-2692 CPU.

关键词： Benchmark testing Arrays Libraries Runtime Signal processing algorithms Hardware

来源：评论

学校读者我要写书评

暂无评论

Direct method-green's theory: From PDE to BIE in the geometric transformation

Direct method-green's theory: From PDE to BIE in the geometr...

引用

2016 International Conference on Wavelet Analysis and Pattern Recognition, ICWAPR 2016

作者： Yang, Li-Na Li, Tao-Shen Tang, Yuan Yan Xu, Jia Pan, Jian-Jia Luo, Hui-Wu Zheng, Xian-Wei School of Computer Electronics and Information Guangxi University Nanning530004 China Department of Computer and Information Science Faculty of Science and Technology University of Macau China Guangxi Colleges Universities Key Laboratory of Parallel and Distributed Computing Nanning530004 China

ISBN: (纸本)9781509035885

In this research, we apply the Green's theory for converting the partial differential equation to the boundary integral equation for geometric transformation. Green's theory is designed specifically for integral equation. It is efficient in detecting the singularity point to the geometric transformation that has been verified. Experimental results show that the Green's theory has good performance. © 2016 IEEE.

关键词： Partial differential equations

来源：评论

学校读者我要写书评

暂无评论

Mod (2P-1) Shuffle Memory-Access Instructions for FFTs on Vector SIMD DSPs

Mod (2P-1) Shuffle Memory-Access Instructions for FFTs on Ve...

引用

IEEE Computer Society Annual Symposium on VLSI

作者： Sheng Liu Hanyan Chen Jianghua Wan Yaohua Wang College of Computer National University of Defense Technology Changsha Hunan China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha Hunan China

ISBN: (纸本)9781467390408

Binary Exchange Algorithm (BEA) always introduces excessive shuffle operations when mapping FFTs on vector SIMD DSPs. This can greatly restrict the overall performance. We propose a novel mod (2P-1) shuffle function and Mod-BEA algorithm (MBEA), which can halve the shuffle operation count and unify the shuffle mode. Such unified shuffle mode inspires us to propose a set of novel mod (2P-1) shuffle memory-access instructions, which can totally eliminate the shuffle operations. Experimental results show that the combination of MBEA and the proposed instructions can bring 17.2%-31.4% performance improvements at reasonable hardware cost, and compress the code size by about 30%.

关键词： Digital signal processing Hardware Pipelines Computer architecture Linearity Computers Software

来源：评论

学校读者我要写书评

暂无评论

Auxo: an architecture-centric framework supporting the online tuning of software adaptivity

引用

science China(Information sciences) 2015年第9期58卷 31-45页

作者： WANG HuaiMin DING Bo SHI DianXi CAO JianNong Alvin T.S.Chan National Key Laboratory of Parallel and Distributed Processing College of ComputerNational University of Defense Technology Department of Computing Hong Kong Polytechnic University

Adaptivity is the capacity of software to adjust itself to changes in its environment. A common approach to achieving adaptivity is to introduce dedicated code during software development stage. However,since those code fragments are designed a priori, self-adaptive software cannot handle situations adequately when the contextual changes go beyond those that are originally anticipated. In this case, the original builtin adaptivity should be tuned. For example, new code should be added to provide the capacity to sense the unexpected environment or to replace outdated adaptation decision logic. The technical challenges in this process, especially that of tuning software adaptivity at runtime, cannot be understated. In this paper,we propose an architecture-centric application framework for self-adaptive software named Auxo. Similar to existing work, our framework supports the development and running of self-adaptive software. Furthermore,our framework supports the tuning of software adaptivity without requiring the running self-adaptive software to be terminated. In short, the architecture style that we are introducing can encapsulate not only general functional logic but also the concerns in the self-adaptation loop(such as sensing, decision, and execution)as architecture elements. As a result, a third party, potentially the operator or an augmented software entity equipped with explicit domain knowledge, is able to dynamically and flexibly adjust the self-adaptation concerns through modifying the runtime software architecture. To truly exercise, validate, and evaluate our approach,we describe a self-adaptive application that was deployed on the framework, and conducted several experiments involving self-adaptation and the online tuning of software adaptivity.

关键词： software architecture self-adaptive software architecture style application framework software adaptation

来源：评论

学校读者我要写书评

暂无评论

Towards Robust Ego-Centric Hand Gesture Analysis for Robot Control

Towards Robust Ego-Centric Hand Gesture Analysis for Robot C...

引用

2016 IEEE International Conference on Signal and Image processing

作者： Hongyong Song Weijiang Feng Naiyang Guan Xuhui Huang Zhigang Luo Institute of Software College of Computer National University of Defense Technology Department of Computer Science and Technology College of Computer National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology

Wearable device with an ego-centric camera would be the next generation device for human-computer interaction such as robot *** gesture is a natural way of egocentric human-computer *** this paper, we present an ego-centric multi-stage hand gesture analysis pipeline for robot control which works robustly in the unconstrained environment with varying *** particular, we first propose an adaptive color and contour based hand segmentation method to segment hand region from the egocentric *** then propose a convex U-shaped curve detection algorithm to precisely detect positions of *** parallelly, we utilize the convolutional neural networks to recognize hand *** on these techniques, we combine most information of hand to control the robot and develop a hand gesture analysis system on an i Phone and a robot arm platform to validate its *** result demonstrates that our method works perfectly on controlling the robot arm by hand gesture in real time.

关键词： ego-centric vision hand detection and segmentation fingertips detection hand gesture recognition robot control human-computer interaction

来源：评论

学校读者我要写书评

暂无评论

An Architecture of parallel Tiled QRD Algorithm for MIMO-OFDM Systems

An Architecture of Parallel Tiled QRD Algorithm for MIMO-OFD...

引用

IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

作者： Cang Liu Chuan Tang Zuocheng Xing Lirui Chen Yang Zhang Guitao Fu National Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha China Beijing Satellite Navigation Center Beijing China

ISBN: (纸本)9781509032068

The QR decomposition (QRD) has been extensively adopted in the transceiver processor of Multiple input multiple output orthogonal frequency division multiplexing (MIMO-OFDM) systems. The antenna configuration of future MIMO-OFDM system is very flexible. Therefore, the QRD architecture should also has the flexibility feature to decompose various dimensional channel response matrices. However, the existing QRD hardware architectures for MIMO-OFDM systems mainly focus on several fixed dimensional matrices. Due to the flexibility and scalability of parallel tiled QRD algorithm, it is very suitable for future MIMO-OFDM systems. A versatile hardware architecture (Ver_Arch) is designed for the bottleneck operations of parallel tiled QRD algorithm in this paper. Based on the designed Ver_Arch, we also design a QRD architecture for 4×4 real matrix. To the best of our knowledge, this is the first paper that presents a completed QRD hardware architecture based on the parallel tiled QRD algorithm for MIMO-OFDM wireless communication systems.

关键词： Matrix decomposition Hardware Algorithm design and analysis Computer architecture Wireless communication MIMO OFDM

来源：评论

学校读者我要写书评

暂无评论

Maximizing Uniform Multicast Throughput in Multi-Channel Dense Wireless Sensor Networks 12

Maximizing Uniform Multicast Throughput in Multi-Channel Den...

引用

12th International Conference on Mobile Ad-Hoc and Sensor Networks, MSN 2016

作者： Jiao, Xianlong Chen, Guirong Wang, Xiaodong Chen, Yuli Yang, Li Information and Navigation College Air Force Engineering University Xi'an710077 China College of Information System and Management National University of Defense Technology Changsha410073 China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha410073 China Chongqing Guanyinqiao Elementary School Chongqing400020 China Chongqing Liangjiangxinqu Renhe Experimental School Chongqing400021 China

ISBN: (纸本)9781509056965

This paper investigates the problem of maximizing uniform multicast throughput (MUMT) for multi-channel dense wireless sensor networks, where all nodes locate within one-hop transmission range and can communicate with each other on multiple orthogonal channels. This kind of networks show wide application in the real world, and maximizing uniform multicast throughput for these networks is worth deep studying. Previous researches have proved MUMT problem is NP-hard. However, previous researches are either hard to implement, or use too many relay nodes to complete the multicast task, and thus incur high overhead or poor performance. To efficiently solve MUMT problem, we adopt the concept of the maximum independent set with the size constraint, and present one novel Single-Broadcast based Multicast algorithm called SBM based on the concept. We prove that SBM algorithm achieves a constant ratio to the theoretical throughput upper bound. Extensive experimental results demonstrate that, SBM performs better than existing work in terms of both the uniform multicast throughput and the total number of transmissions. © 2016 IEEE.

关键词： Throughput

来源：评论

学校读者我要写书评

暂无评论

DIPP—An LLC replacement policy for on-chip dynamic heterogeneous multi-core architecture

DIPP—An LLC replacement policy for on-chip dynamic heteroge...

引用

International Conference of Young Computer Scientists, Engineers and Educators, ICYCSEE 2015

作者： Yang, Zhang Zuocheng, Xing Xiao, Ma Science and technology on Parallel and distributed processing laboratory National University of Defense Technology ChangSha China

ISBN: (纸本)9783662462478

As the big data era is coming, it brings new challenges to the massive data processing. A combination of GPU and CPU on chip is the trend to release the pressure of large scale computing. We found that there are different memory access characteristics between GPU and CPU. The most important one is that the programs of GPU include a large number of threads, which lead to higher access frequency in cache than the CPU programs. Although the LRU policy favors the programs with high memory access frequency, the programs of GPU can’t get the corresponding performance boost even more cache resources are provided. So LRU policy is not suitable for heterogeneous multi-core processor. Based on the different characteristics of GPU and CPU programs on memory access, this paper proposes an LLC dynamic replacement policy--DIPP (Dynamic Insertion/ Promotion Policy) for heterogeneous multi-core processors. The core idea of the replacement policy is to reduce the miss rate of the program and enhance the overall system performance by limiting the cache resources that GPU can acquire and reducing the thread interferences between programs. Experiments compare the DIPP replacement policy with LRU and we conduct a classified discussion according to the program results of GPU. Friendly programs enhance 23.29% on the average performance (using arithmetic mean). Large working sets programs can improve 13.95%, compute-intensive programs enhance 9.66% and stream class programs improve 3.8%. © Springer-Verlag Berlin Heidelberg 2015.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

Enhancing Temporal Alignment with Autoencoder Regularization

Enhancing Temporal Alignment with Autoencoder Regularization

引用

International Joint Conference on Neural Networks

作者： Liquan Nie Yuanyuan Wang Xiang Zhang Xuhui Huang Zhigang Luo Science and Technology on Parallel and Distributed Processing Laboratory College of Computer National University of Defense Technology Department of Basic Courses Army Officer Academy Department of Computer Science and Technology College of Computer National University of Defense Technology

ISBN: (纸本)9781509006212

Temporal alignment aligns two temporal sequences and is quite challenging due to drastic differences among temporal sequences and source data from different views. Canonical time warping (CTW) has shown great potential in temporal alignment tasks because it can reduce data redundancy by transforming high-dimensional data to a lower-dimensional subspace via canonical correlation analysis (CCA). However, CTW cannot uncover the underlying nonlinear structure embedded in the dataset. In this paper, we propose an autoencoder regularized canonical time warping method (AECTW) to overcome this drawback. Specifically, AECTW enhances lower-dimensional representation of each sequence by incorporating an autoencoder regularization, meanwhile reveals the nonlinear structure of features by explicit nonlinear transformation. By these strategies, AECTW significantly boosts CTW in temporal alignment tasks. Experiments on both synthetic data and two practical human action datasets demonstrate that AECTW outperforms the representative DTW-based methods.

关键词： NONLINEAR STRUCTURES data redundancy Dataset synthetic data Alignment Warping Structural properties Canonical Religious Missions Data sources

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：