In this paper, a 3D dangerous goods detection method based on RetinaNet is proposed. This method uses the bidirectional feature pyramid network structure of RetinaNet to extract multi-scale features from point cloud d...
详细信息
Learning comprehensive spatiotemporal features is crucial for human action recognition. Existing methods tend to model the spatiotemporal feature blocks in an integrate-separate-integrate form, such as appearance-and-...
详细信息
Learning comprehensive spatiotemporal features is crucial for human action recognition. Existing methods tend to model the spatiotemporal feature blocks in an integrate-separate-integrate form, such as appearance-and-relation network(ARTNet) and spatiotemporal and motion network(STM). However, with blocks stacking up, the rear part of the network has poor interpretability. To avoid this problem, we propose a novel architecture called spatial temporal relation network(STRNet), which can learn explicit information of appearance, motion and especially the temporal relation information. Specifically, our STRNet is constructed by three branches,which separates the features into 1) appearance pathway, to obtain spatial semantics, 2) motion pathway, to reinforce the spatiotemporal feature representation, and 3) relation pathway, to focus on capturing temporal relation details of successive frames and to explore long-term representation dependency. In addition, our STRNet does not just simply merge the multi-branch information, but we apply a flexible and effective strategy to fuse the complementary information from multiple pathways. We evaluate our network on four major action recognition benchmarks: Kinetics-400, UCF-101, HMDB-51, and Something-Something v1, demonstrating that the performance of our STRNet achieves the state-of-the-art result on the UCF-101 and HMDB-51 datasets, as well as a comparable accuracy with the state-of-the-art method on Something-Something v1 and Kinetics-400.
Monocular 6D pose estimation is a functional task in the field of com-puter vision and *** recent years,2D-3D correspondence-based methods have achieved improved performance in multiview and depth data-based ***,for m...
详细信息
Monocular 6D pose estimation is a functional task in the field of com-puter vision and *** recent years,2D-3D correspondence-based methods have achieved improved performance in multiview and depth data-based ***,for monocular 6D pose estimation,these methods are affected by the prediction results of the 2D-3D correspondences and the robustness of the per-spective-n-point(PnP)*** is still a difference in the distance from the expected estimation *** obtain a more effective feature representation result,edge enhancement is proposed to increase the shape information of the object by analyzing the influence of inaccurate 2D-3D matching on 6D pose regression and comparing the effectiveness of the intermediate ***,although the transformation matrix is composed of rotation and translation matrices from 3D model points to 2D pixel points,the two variables are essentially different and the same network cannot be used for both variables in the regression ***,to improve the effectiveness of the PnP algo-rithm,this paper designs a dual-branch PnP network to predict rotation and trans-lation ***,the proposed method is verified on the public LM,LM-O and YCB-Video *** ADD(S)values of the proposed method are 94.2 and 62.84 on the LM and LM-O datasets,*** AUC of ADD(-S)value on YCB-Video is *** experimental results show that the performance of the proposed method is superior to that of similar methods.
We propose an extension of RBF networks which includes a mechanism for optimizing the complexity of the network. The approach involves two procedures: adaptation (training) and selection. The first procedure adaptivel...
详细信息
Video stylization transfers a source video into an artistic version while maintaining temporal coherence between adjacent frames. In this paper, we formulate the unsupervised example-based video stylization with Marko...
详细信息
ISBN:
(纸本)9781450306164
Video stylization transfers a source video into an artistic version while maintaining temporal coherence between adjacent frames. In this paper, we formulate the unsupervised example-based video stylization with Markov random field model. In our algorithm, we implement an improved optical flow algorithm to maintain temporal coherence while improve the accuracy of estimation along motion boundaries. We also extend our algorithm to the application of video personalization, in which human faces keep clear and distinguishable. A series of techniques are fused in video personalization, including face detection and alignment, motion flow, skin detection, and illumination blending. Given a source video and a style template image, our algorithm produces the stylized and/or personalized video(s) automatically. Experimental results demonstrate that our algorithm performs excellently in both video stylization and personalization. Copyright 2011 ACM.
As an important technology in the domains of ocean exploration and underwater robotics, underwater optical image enhancement has drawn significant attention. However, underwater images suffer from severe degradation d...
详细信息
Temporal action localization is a challenging task in video understanding. Although great progress has been made in temporal action localization, the most advanced methods still have the problem of sharp performance d...
详细信息
Motion estimation is a basic issue for many computervision tasks, such as human-computer interaction, motion objection detection and intelligent robot. In many practical scenes, the object movement goes with camera m...
详细信息
Motion estimation is a basic issue for many computervision tasks, such as human-computer interaction, motion objection detection and intelligent robot. In many practical scenes, the object movement goes with camera motion. Generally, motion descriptors directly based on optical flow are inaccurate and have low discrimination power. To this end, a novel motion correction method is proposed and a novel motion feature descriptor called the motion difference histogram (MDH) for recognising human action is proposed in this study. Motion estimation results are corrected by background motion estimation and MDH encodes the motion difference between the background and the objects. Experimental results on video shot with camera motion show that the proposed motion correction method is effective and the recognition accuracy of MDH is better than that of the state-of-the-art motion descriptor.
This paper proposes a novel approach to single image super-resolution. First, an image up-sampling scheme is proposed which takes the advantages of both bilateral filtering and mean shift image segmentation. Then we u...
详细信息
ISBN:
(纸本)9781450306164
This paper proposes a novel approach to single image super-resolution. First, an image up-sampling scheme is proposed which takes the advantages of both bilateral filtering and mean shift image segmentation. Then we use a shock filter to enhance strong edges in the initial up-sampling result and obtain an intermediate high-resolution image. Finally, we enforce a reconstruction constraint on the high-resolution image so that fine details can be inferred by back projection. Since strong edges in the intermediate result are enhanced, ringing artifacts can be suppressed in the back projection step. We compare our algorithm with several state-of-the-art image super-resolution algorithms. Qualitative and quantitative experimental results demonstrate that our approach performs the best. Copyright 2011 ACM.
暂无评论