Convolutional neural networks (CNNs) have achieved stateof-the-art results on many visual recognition tasks. However, current CNN models still exhibit a poor ability to be invariant to spatial transformations of image...
详细信息
Recently high-level pose features (HLPF) have been shown to be efficient for action recognition in joint-annotated tasks. However, the relative positions between pairs of joints in actual situations and the spatio-tem...
详细信息
ISBN:
(纸本)9781509015535
Recently high-level pose features (HLPF) have been shown to be efficient for action recognition in joint-annotated tasks. However, the relative positions between pairs of joints in actual situations and the spatio-temporal information are not considered in constructing HLPF. To tackle their problems, we propose a set of novel high-level pose features (NHLPF). Specifically, considering that the distances between adjacent pairs of joints usually remain unchanged, we propose a horizontally relative position feature and a vertically relative position feature. In addition, a joint inner product feature is proposed to code the spatialinformation among each triplet of joints. To code temporal information, we calculate the trajectories of the above-mentioned three types of features as corresponding trajectory features. Furthermore, to combine the spatial and temporal information, we present a joint energy change feature, which is designed using observations of the magnitude and direction of the force between joints. We evaluate our NHLPF on a benchmark dataset. The results show that NHPLF are superior features for action recognition.
With the explosive growth in the number of mobile terminals, the demand for visual communication with mobility is increasing. However, traditional solutions for mobility over IP network cannot always meet the demand o...
详细信息
ISBN:
(纸本)9781509053179
With the explosive growth in the number of mobile terminals, the demand for visual communication with mobility is increasing. However, traditional solutions for mobility over IP network cannot always meet the demand of satisfying visual communication. Named Data Networking (NDN) is a new communication model that aims to replace IP model brings a different background to mobile visual communication problems. In this paper, we take advantage of the NDN model to realize seamless mobile visual communication. We introduce a delegate with calculation functions and a globally unique identifier (GUID) which can provide native identity indication into the NDN mechanism. The use of GUID benefits real-time applications like visual communication and further works with the delegate to decrease unnecessary routing update. We also specify the naming rule and design a FIB+ to support seamless mobile visual communication. To test the performance of our solutions, we build a proof-of-concept prototype and run experiments on it. The experiments demonstrate that our solution can provide real-time video communication with seamless mobility experience.
In this paper, we consider video communication over fading channel, where the perfect instantaneous channel state information (CSI) is available at both sender and receiver. Most of existing coding schemes are ineffic...
详细信息
ISBN:
(纸本)9781479953424
In this paper, we consider video communication over fading channel, where the perfect instantaneous channel state information (CSI) is available at both sender and receiver. Most of existing coding schemes are inefficient in this communication scenario. The reason is that for digital coding scheme, it has high coding efficiency but unavoidably leads to the cliff effect;while for analog scheme, it has graceful video quality variation with channel varying, but has low coding efficiency. Hence, to integrate the advantages of digital coding and analog coding, we propose a hybrid digital-analog (HDA) scheme. In our scheme, we have adopted adaptive power allocation and adaptive forward error coding (FEC) in digital part to accommodate instantaneous channel quality. The evaluation results show that the proposed HDA scheme outperforms Parcast (a state-of-the-art analog scheme) 0.3~2.2dB under the channel Signal-to-Noise Ratio (SNR) from 3dB to 20dB.
Part-based trackers have achieved promising performance in many tracking tasks. However, most part-based trackers use the same feature representation for all parts and simply combine them together to form an integral ...
详细信息
ISBN:
(纸本)9781509053179
Part-based trackers have achieved promising performance in many tracking tasks. However, most part-based trackers use the same feature representation for all parts and simply combine them together to form an integral representation for the tracking target. It may not guarantee that all parts of the tracking target can well distinguish the foreground from the background. Better performance is expected by exploring different feature representations on different parts of the tracking target. In this paper, following the framework of the classic Compressive Tracker (CT), we model each part of the target adaptively by using a multi-dimensional color representation. By using color name, we select the color feature presentation that best distinguishes the foreground from background. In order to better handle deformation and illumination change, we use multi-Gaussian to model different appearance changes of the tracking target. Both qualitative and quantitative evaluations demonstrate that the proposed method makes a consistent performance improvement compared with the conventional Compressive Tracker on tracking benchmark dataset. Besides, it also outperforms many state-of-the-art trackers while running at averagely 20 frames per second (FPS).
Content-aware image retargeting has attracted substantial research interests in the related research community. However, so far there is still no method can preserve important image contents and structure well without...
详细信息
ISBN:
(纸本)9781479999897
Content-aware image retargeting has attracted substantial research interests in the related research community. However, so far there is still no method can preserve important image contents and structure well without introducing deformation. To address this problem, we propose a Saliency & Structure Preserving Multi-operator (SSPM) method. SSPM classifies images into three categories utilizing SIFT density to improve performance of saliency preservation, helping to mitigate negative influence from center-bias property of most existing saliency detection models. SSPM also employs different principles to improve structure preservation performance, including Earth Mover's Distance (EMD) and Gray-Level Cooccurrence Matrix (GLCM) to get optimal operator sequences for smart content-aware image retargeting. SSPM method not only can well preserve salient contents and structure, but also can greatly improve deformation resilience. Experimental results demonstrated that our method outperforms state-of-art image retargeting methods.
This paper addresses the issue on how to more effectively coordinate the depth with RGB aiming at boosting the performance of RGB-D object detection. Particularly, we investigate two primary ideas under the CNN model:...
详细信息
This paper addresses the issue on how to more effectively coordinate the depth with RGB aiming at boosting the performance of RGB-D object detection. Particularly, we investigate two primary ideas under the CNN model: property derivation and property fusion. Firstly, we propose that the depth can be utilized not only as a type of extra information besides RGB but also to derive more visual properties for comprehensively describing the objects of interest. So a two-stage learning framework consisting of property derivation and fusion is constructed. Here the properties can be derived either from the provided color/depth or their pairs (e.g. the geometry contour adopted in this paper). Secondly, we explore the fusion method of different properties in feature learning, which is boiled down to, under the CNN model, from which layer the properties should be fused together. The analysis shows that different semantic properties should be learned separately and combined before passing into the final classifier. Actually, such a detection way is in accordance with the mechanism of the primary neural cortex (V1) in brain. We experimentally evaluate the proposed method on the challenging dataset, and have achieved state-of-the-art performance.
Visual Quality Assessment of 3D/stereoscopic video (3D VQA) is significant for both quality monitoring and optimization of the existing 3D video services. In this paper, we build a 3D video database based on the lates...
详细信息
ISBN:
(纸本)9781509003556
Visual Quality Assessment of 3D/stereoscopic video (3D VQA) is significant for both quality monitoring and optimization of the existing 3D video services. In this paper, we build a 3D video database based on the latest 3D-HEVC video coding standard, to investigate the relationship among video quality, depth quality, and overall quality of experience (QoE) of 3D/stereoscopic video. We also analyze the pivotal factors to the video and depth qualities. Moreover, we develop a No-Reference 3D-HEVC bitstream-level objective video quality assessment model, which utilizes the key features extracted from the 3D video bitstreams to assess the perceived quality of the stereoscopic video. The model is verified to be effective on our database as compared with widely used 2D Full-Reference quality metrics as well as a state-of-the-art 3D FR pixel-level video quality metric.
Content-aware image retargeting has attracted substantial research interests in the related research community. However, so far there is still no method can adequately preserve important image contents and structure w...
详细信息
ISBN:
(纸本)9781509053179
Content-aware image retargeting has attracted substantial research interests in the related research community. However, so far there is still no method can adequately preserve important image contents and structure well without introducing conspicuous visible deformation in a relatively short period of time. To address this problem, we propose a Fast Genetic Multi-operator (FGM) method which integrates multiple retargeting operators. To improve the efficiency, FGM method utilizes Genetic Algorithms (GAs) to reach the optimal operator ratio, which adopts saliency and Gray-Level Co-occurrence Matrix (GLCM) as its energy function. FGM method not only can well preserve salient contents and structure, but also can greatly reduce the computational complexity. Experimental results demonstrated that our method outperforms state-of-art image retargeting methods.
The directional intra prediction (DIP) modes in HEVC are capable of predicting local continuous image features. Recently, intra block copy (IBC) is proposed for screen content coding, aiming at predicting non-local re...
详细信息
The directional intra prediction (DIP) modes in HEVC are capable of predicting local continuous image features. Recently, intra block copy (IBC) is proposed for screen content coding, aiming at predicting non-local recurrent image features. For natural video, we observe that recurrent features are often irregular and not aligned with blocks. Thus, we propose a combination of DIP and IBC with block partition for better intra prediction, where one block can be divided into several partitions, each of which may choose between DIP and IBC. We study an intra prediction scheme with the proposed combination, especially the rate-distortion optimization and entropy coding in the scheme. Preliminary experimental results show that the proposed combined intra prediction achieves as high as 5.8% bit-rate saving compared to HEVC anchor.
暂无评论