Detecting human key points from a single image is very challenging due to occlusion, blurring, illumination and scale changes. In this paper, this problem is addressed by designing an effective network structure. Sinc...
详细信息
Detecting human key points from a single image is very challenging due to occlusion, blurring, illumination and scale changes. In this paper, this problem is addressed by designing an effective network structure. Since global and local information plays an important role in reasoning about human body structure and invisible keypoints, Multi-level Attention Network (MAN) is proposed. First, compared with traditional multi-resolution networks, it enables multi-resolution feature maps with greater information variance by generating them directly from the highest resolution feature map, which in turn increases the abundance of feature information after final fusion. Secondly, it effectively integrates global and local information in different resolution feature maps through the Feature Alignment Attention Block(FAAB), and intensifies them in a targeted manner. On the COCO dataset, with HRNet (Sun K. et al [1]) as the baseline network, HRNet of inserted MAN improves 1.1-2.3 AP points over the baseline network.
Monocular depth estimation is a fundamental task in computer vision and has drawn increasing attention. Recently, attention-based models and encoder-decoder architectures have led to great improvements in monocular de...
Monocular depth estimation is a fundamental task in computer vision and has drawn increasing attention. Recently, attention-based models and encoder-decoder architectures have led to great improvements in monocular depth estimation. Typically, most of the previous methods used repeated simple up-sampling operations during decoding, which may not make full use of the potential properties of the features extracted by the encoder, and there are problems of inaccurate prediction of the edge and depth maximum region. We propose an attention-based feature fusion module for encoder and decoder. We treat the monocular depth estimation as a pixel-level optimization problem, where the coarsest encoder feature is used to initialize the pixel-level optimization, which is then refined to higher resolution by the proposed attentional feature fusion (AFF). We formulate the prediction problem as ordinal regression over the bin centers that discretize the continuous depth range. It predicts a correspondingly different distribution of bins based on different pictures and we predict bins at the coarsest level using global pooling and MLP layers. In the NYUV2 dataset, the proposed architecture improving original model by 2.5.% and 1.1%, in terms of Log10 and Absolute relative error, respectively.
Unmanned air vehicles (UAVs) are identified as an integral part of future military forces. The coordinated route-planning problems of UAV team with various architectures are addressed in the framework of game theory. ...
详细信息
ISBN:
(纸本)0780384032
Unmanned air vehicles (UAVs) are identified as an integral part of future military forces. The coordinated route-planning problems of UAV team with various architectures are addressed in the framework of game theory. A two-stage route planner has been proposed, which combines various game models and the concept of evolutionary computation and is compatible with the cooperative/competitive nature envisioned for UAV team. Our route planner can handle different kinds of mission constraints in hierarchical style. Potential routes of each vehicle form their own sub-population, and evolve only in their own sub-population, while the cooperation and competition among UAVs are reflected by the definition of fitness function. Experimental results show the feasibility of generating the coordinated routes for UAV team using game theory methods.
Instance segmentation is a comprehensive computer vision task that involves a wide range of other tasks. Recently, the study of real-time instance segmentation methods has received more attention for the development o...
详细信息
Instance segmentation is a comprehensive computer vision task that involves a wide range of other tasks. Recently, the study of real-time instance segmentation methods has received more attention for the development of autonomous driving. Although existing real-time instance segmentation methods are fast, their accuracy does not meet practical needs. Most methods go for segmentation based on object detection, and their effectiveness is overly dependent on the effectiveness of detection. This paper proposes a new attention-based multiscale information fusion method based on Cheng, T. et al. [1]. Firstly, the PPM module of the baseline network is replaced with the module Multiscale Context Attention (MSCA) designed in this paper based on the baseline network, which uses atrous convolution with different ratios to obtain information of four scales, and then uses non-local attention to enhance the information of features. It can effectively suppress the interference of redundant information on the instance segmentation results. Secondly, a new feature fusion approach is designed, which no longer uses bilinear interpolation, but sub-pixel up sampling combined with attention. We did experiments related to this module on the coco dataset and demonstrated its effectiveness, with a 0.5% improvement over the baseline network.
In this paper, we experimentally evaluate three different averaging methods for processing of electroencephalogram (EEG) event related potentials (ERPs) measured from scalp in response to repeated stimulus. In ERP app...
详细信息
In this paper, we experimentally evaluate three different averaging methods for processing of electroencephalogram (EEG) event related potentials (ERPs) measured from scalp in response to repeated stimulus. In ERP applications, arithmetic mean (AM) is normally employed in processing the ERPs prior to ERP detection, whereas also other averaging methods might have beneficial properties. Fast ERP detection is essential, for example, in brain computer interfaces and during spine surgery. Thus, it is of interest to search for methods to aid in detecting ERPs with as few stimulus repetitions as possible. Here, noise reduction properties of AM, geometric mean (GM), and harmonic mean (HM) are demonstrated with simulations, and ERP processing by the three methods is illustrated by processing real visual evoked potentials (VEPs).
This paper generalizes regularized regression problems in a hyper-reproducing kernel Hilbert space (hyper-RKHS), illustrates its utility for kernel learning and out-of-sample extensions, and proves asymptotic converge...
详细信息
This paper generalizes regularized regression problems in a hyper-reproducing kernel Hilbert space (hyper-RKHS), illustrates its utility for kernel learning and out-of-sample extensions, and proves asymptotic convergence results for the introduced regression models in an approximation theory view. Algorithmically, we consider two regularized regression models with bivariate forms in this space, including kernel ridge regression (KRR) and support vector regression (SVR) endowed with hyper-RKHS, and further combine divide-and-conquer with Nyström approximation for scalability in large sample cases. This framework is general: the underlying kernel is learned from a broad class, and can be positive definite or not, which adapts to various requirements in kernel learning. Theoretically, we study the convergence behavior of regularized regression algorithms in hyper-RKHS and derive the learning rates, which goes beyond the classical analysis on RKHS due to the non-trivial independence of pairwise samples and the characterisation of hyper-RKHS. Experimentally, results on several benchmarks suggest that the employed framework is able to learn a general kernel function form an arbitrary similarity matrix, and thus achieves a satisfactory performance on classification tasks.
The number of arithmetic units used in the one-dimensional (1D) discrete wavelet transform (DWT) is the main consideration for reducing the area of VLSI implementation of 1D DWT, while the size of intermediate memory ...
详细信息
ISBN:
(纸本)0780390156
The number of arithmetic units used in the one-dimensional (1D) discrete wavelet transform (DWT) is the main consideration for reducing the area of VLSI implementation of 1D DWT, while the size of intermediate memory used for data buffering is another dominate factor of effecting hardware complexity of VLSI implementation for two-dimensional (2D) DWT. In this paper, we exploit the essential relationship between the size of temporal buffer (TB) required in the line-based architecture for 2D DWT (LBA2DDWT) and the number of registers used in the 1D DWT module, and present an improved method of mapping the registers used in the 1D DWT to the TB required in LBA2DDWT. Comparison results with the other design reported in previous literature demonstrate that, the proposed mapping method can reduce efficiently the size of memory required in LBA2DDWT.
Noise removal is an important problem in many applications. In this paper a new two-step scheme of the decision-based impulse noise removal method by means of contaminated pixel detection is proposed and comparison wi...
详细信息
Noise removal is an important problem in many applications. In this paper a new two-step scheme of the decision-based impulse noise removal method by means of contaminated pixel detection is proposed and comparison with direct order statistic filtering is given. The proposed methods satisfy both objective and subjective image quality.
We propose a text scanner which detects wide text strings in a sequence of scene images. For scene text detection, we use a multiple-CAMShift algorithm on a text probability image produced by a multi-layer perceptron....
详细信息
ISBN:
(纸本)076951695X
We propose a text scanner which detects wide text strings in a sequence of scene images. For scene text detection, we use a multiple-CAMShift algorithm on a text probability image produced by a multi-layer perceptron. To provide enhanced resolution of the extracted text images, we perform the text detection process after generating a mosaic image in a fast and robust image registration method.
One-bit measurements widely exist in the real world and can be used to recover sparse signals. This task is known as one-bit compressive sensing (1bit-CS). In this paper, we propose novel algorithms based on both conv...
详细信息
暂无评论