Convolution operation is the most important and time consuming step in a convolution neural network *** this work,we analyze the computing complexity of direct convolution and fast-Fourier-transform-based(FFT-based) *...
详细信息
ISBN:
(纸本)9781510835368
Convolution operation is the most important and time consuming step in a convolution neural network *** this work,we analyze the computing complexity of direct convolution and fast-Fourier-transform-based(FFT-based) *** creatively propose CS-unit,which is equivalent to a combination of a convolutional layer and a pooling layer but more *** computing complexity of and some other similar operation is demonstrated,revealing an advantage on computation of ***,practical experiments are also performed and the result shows that CS-unit holds a real superiority on run time.
Powering is an important operation in many computation intensive workloads. This paper investigates the performance of different styles to calculate the powering operations from the application level. A series of smal...
详细信息
ISBN:
(纸本)9781509045181
Powering is an important operation in many computation intensive workloads. This paper investigates the performance of different styles to calculate the powering operations from the application level. A series of small benchmark codes that calculate the powering operations in different ways are designed. Their performance is evaluated on Intel Xeon CPU under Intel compilation environments. The results show that the number of floating-point operations and the related runtime are sensitive to the value of the exponent Y and how it is used. When Y is an immediate integer number whose value is known at compile time, the cost of powering is much less than the situation when Y is an integer variable whose value is known at runtime. When Y is defined as a real variable, the cost of powering is always high, be it equals to an integer number or not. Based on the investigations, performance optimizations are applied to a kernel subroutine from a real-world supersonic combustion simulation code, which intensively involves powering operations. The result shows that the performance of that subroutine is improved for 13.25 times on the Intel Xeon E5-2692 CPU.
In this research, we apply the Green's theory for converting the partial differential equation to the boundary integral equation for geometric transformation. Green's theory is designed specifically for integr...
详细信息
Binary Exchange Algorithm (BEA) always introduces excessive shuffle operations when mapping FFTs on vector SIMD DSPs. This can greatly restrict the overall performance. We propose a novel mod (2P-1) shuffle function a...
详细信息
ISBN:
(纸本)9781467390408
Binary Exchange Algorithm (BEA) always introduces excessive shuffle operations when mapping FFTs on vector SIMD DSPs. This can greatly restrict the overall performance. We propose a novel mod (2P-1) shuffle function and Mod-BEA algorithm (MBEA), which can halve the shuffle operation count and unify the shuffle mode. Such unified shuffle mode inspires us to propose a set of novel mod (2P-1) shuffle memory-access instructions, which can totally eliminate the shuffle operations. Experimental results show that the combination of MBEA and the proposed instructions can bring 17.2%-31.4% performance improvements at reasonable hardware cost, and compress the code size by about 30%.
Adaptivity is the capacity of software to adjust itself to changes in its environment. A common approach to achieving adaptivity is to introduce dedicated code during software development stage. However,since those co...
详细信息
Adaptivity is the capacity of software to adjust itself to changes in its environment. A common approach to achieving adaptivity is to introduce dedicated code during software development stage. However,since those code fragments are designed a priori, self-adaptive software cannot handle situations adequately when the contextual changes go beyond those that are originally anticipated. In this case, the original builtin adaptivity should be tuned. For example, new code should be added to provide the capacity to sense the unexpected environment or to replace outdated adaptation decision logic. The technical challenges in this process, especially that of tuning software adaptivity at runtime, cannot be understated. In this paper,we propose an architecture-centric application framework for self-adaptive software named Auxo. Similar to existing work, our framework supports the development and running of self-adaptive software. Furthermore,our framework supports the tuning of software adaptivity without requiring the running self-adaptive software to be terminated. In short, the architecture style that we are introducing can encapsulate not only general functional logic but also the concerns in the self-adaptation loop(such as sensing, decision, and execution)as architecture elements. As a result, a third party, potentially the operator or an augmented software entity equipped with explicit domain knowledge, is able to dynamically and flexibly adjust the self-adaptation concerns through modifying the runtime software architecture. To truly exercise, validate, and evaluate our approach,we describe a self-adaptive application that was deployed on the framework, and conducted several experiments involving self-adaptation and the online tuning of software adaptivity.
Wearable device with an ego-centric camera would be the next generation device for human-computer interaction such as robot *** gesture is a natural way of egocentric human-computer *** this paper, we present an ego-c...
详细信息
Wearable device with an ego-centric camera would be the next generation device for human-computer interaction such as robot *** gesture is a natural way of egocentric human-computer *** this paper, we present an ego-centric multi-stage hand gesture analysis pipeline for robot control which works robustly in the unconstrained environment with varying *** particular, we first propose an adaptive color and contour based hand segmentation method to segment hand region from the egocentric *** then propose a convex U-shaped curve detection algorithm to precisely detect positions of *** parallelly, we utilize the convolutional neural networks to recognize hand *** on these techniques, we combine most information of hand to control the robot and develop a hand gesture analysis system on an i Phone and a robot arm platform to validate its *** result demonstrates that our method works perfectly on controlling the robot arm by hand gesture in real time.
The QR decomposition (QRD) has been extensively adopted in the transceiver processor of Multiple input multiple output orthogonal frequency division multiplexing (MIMO-OFDM) systems. The antenna configuration of futur...
详细信息
ISBN:
(纸本)9781509032068
The QR decomposition (QRD) has been extensively adopted in the transceiver processor of Multiple input multiple output orthogonal frequency division multiplexing (MIMO-OFDM) systems. The antenna configuration of future MIMO-OFDM system is very flexible. Therefore, the QRD architecture should also has the flexibility feature to decompose various dimensional channel response matrices. However, the existing QRD hardware architectures for MIMO-OFDM systems mainly focus on several fixed dimensional matrices. Due to the flexibility and scalability of parallel tiled QRD algorithm, it is very suitable for future MIMO-OFDM systems. A versatile hardware architecture (Ver_Arch) is designed for the bottleneck operations of parallel tiled QRD algorithm in this paper. Based on the designed Ver_Arch, we also design a QRD architecture for 4×4 real matrix. To the best of our knowledge, this is the first paper that presents a completed QRD hardware architecture based on the parallel tiled QRD algorithm for MIMO-OFDM wireless communication systems.
This paper investigates the problem of maximizing uniform multicast throughput (MUMT) for multi-channel dense wireless sensor networks, where all nodes locate within one-hop transmission range and can communicate with...
详细信息
As the big data era is coming, it brings new challenges to the massive data processing. A combination of GPU and CPU on chip is the trend to release the pressure of large scale computing. We found that there are diffe...
详细信息
Temporal alignment aligns two temporal sequences and is quite challenging due to drastic differences among temporal sequences and source data from different views. Canonical time warping (CTW) has shown great potentia...
详细信息
ISBN:
(纸本)9781509006212
Temporal alignment aligns two temporal sequences and is quite challenging due to drastic differences among temporal sequences and source data from different views. Canonical time warping (CTW) has shown great potential in temporal alignment tasks because it can reduce data redundancy by transforming high-dimensional data to a lower-dimensional subspace via canonical correlation analysis (CCA). However, CTW cannot uncover the underlying nonlinear structure embedded in the dataset. In this paper, we propose an autoencoder regularized canonical time warping method (AECTW) to overcome this drawback. Specifically, AECTW enhances lower-dimensional representation of each sequence by incorporating an autoencoder regularization, meanwhile reveals the nonlinear structure of features by explicit nonlinear transformation. By these strategies, AECTW significantly boosts CTW in temporal alignment tasks. Experiments on both synthetic data and two practical human action datasets demonstrate that AECTW outperforms the representative DTW-based methods.
暂无评论