Pedestrian Attribute recognition (PAR) has received extensive attention during the past few years. With the advances of deep convolutional neural networks (CNNs), the performance of PAR has been significantly improved...
详细信息
Pedestrian Attribute recognition (PAR) has received extensive attention during the past few years. With the advances of deep convolutional neural networks (CNNs), the performance of PAR has been significantly improved. Existing methods tend to acquire attribute-specific features by designing various complex network structures with additional modules. Such additional modules, however, dramatically increase the number of network parameters. Meanwhile, the problems of class imbalance and hard attribute retrieving remain underestimated in PAR. In this paper, we explore the optimization mechanism of the training processing to account for these problems and propose a new loss function called Multi-label Contrastive Focal Loss (MCFL). This proposed MCFL emphasizes the hard and minority attributes by using a separated re-weighting mechanism for different positive and negative classes to alleviate the impact of the imbalance. MCFL is also able to enlarge the gaps between the intra-class of multi-label attributes, to force CNNs to extract more subtle discriminative features. We evaluate the proposed MCFL on three large public pedestrian datasets, including RAP, PA-100K, and PETA. The experimental results indicate that the proposed MCFL with the ResNet-50 backbone is able to outperform other state-of-the-art approaches in term of mean accuracy.
The differential evaluation (DE) algorithm is an evolutionary algorithm. It is a popular metaheuristics that efficiently solved various complex optimization problems. This paper proposed modification in DE, motivated ...
详细信息
Facial attribute recognition is a popular and challenging research topic in computer vision. In the traditional deep learning based attribute recognition methods, the mid-level network features and the differences bet...
详细信息
Video surveillance is very important in automatic surveillance. It is usually used to monitor any criminal activity and help to find the perpetrator. In the search for individual among pedestrians when the face image ...
详细信息
Although Convolutional Neural Networks have made significant progress in image segmentation, it remains inadequate for exploring the structural relationships between image components and how graphs can be employed to ...
详细信息
ISBN:
(数字)9781665463829
ISBN:
(纸本)9781665463836
Although Convolutional Neural Networks have made significant progress in image segmentation, it remains inadequate for exploring the structural relationships between image components and how graphs can be employed to guide image segmentation. To explore the structural relationships inherent in image components, the Graph Structure Learning Boosted Neural Network was proposed, which takes the contextual information generated by the CNN as features of the nodes and then uses a self-supervised graph generator to generate an adjacency matrix representing the image components connectivity. Then a Graph Neural Network (GNN) uses the adjacency matrix to fuse information between components according to their connectivity, thus transforming the CNN’s pixel classification problem into the GNN’s pixel classification problem. The whole model is lightweight and scalable, and extensive experiments have demonstrated the scalability of the model alongside the effectiveness of the method.
Due to the advances in computer graphic technology, it has become very difficult to classify computer graphics from photographic images. It is also very difficult to distinguish tampered images from authentic images. ...
详细信息
Although the appearance of the control chart can help people visually observe the quality variation in the production process of lithium batteries, the impact of the variation on the battery quality cannot be directly...
详细信息
In this paper, our main aim is to do the simulation of a metallic horn antenna fed into a metallic parabolic reflector and a comparison of various factors like gain, directivity, radiation efficiency, radiation intens...
详细信息
ISBN:
(数字)9781728148762
ISBN:
(纸本)9781728148762
In this paper, our main aim is to do the simulation of a metallic horn antenna fed into a metallic parabolic reflector and a comparison of various factors like gain, directivity, radiation efficiency, radiation intensity based on efficiency factor. The simulation of this horn fed parabolic reflector antenna is carried out in ANSYS HFSS 13.0. Results like reflection coefficient, radiation pattern, 3D polar plot, sidelobe level, beamwidth, radiated power, accepted power have been obtained using HFSS 13.0. The scope of the proposed innovation extends within parabolic reflectors with input given by horn antenna. The applications of the designed antenna have been discussed in the future scope. The main advantages have also been discussed.
3D human pose estimation from a single 2D video is an extremely difficult task because computing 3D geometry from 2D images is an ill-posed problem. Recent popular solutions adopt fully-supervised learning strategy, w...
详细信息
3D human pose estimation from a single 2D video is an extremely difficult task because computing 3D geometry from 2D images is an ill-posed problem. Recent popular solutions adopt fully-supervised learning strategy, which requires to train a deep network on a large-scale ground truth dataset of 3D poses and 2D images. However, such a large-scale dataset with natural images does not exist, which limits the usability of existing methods. While building a complete 3D dataset is tedious and expensive, abundant 2D in-the-wild data is already publicly available. As a consequence, there is a growing interest in the computer vision community to design efficient techniques that use the unsupervised learning strategy, which does not require any ground truth 3D data. Such methods can be trained with only natural 2D images of humans. In this paper we propose an unsupervised method for estimating 3D human pose in videos. The standard approach for unsupervised learning is to use the Generative Adversarial Network (GAN) framework. To improve the performance of 3D human pose estimation in videos, we propose a new GAN network that enforces body consistency over frames in a video. We evaluate the efficiency of our proposed method on a public 3D human body dataset.
We propose a new task named Audio-driven Performance Video Generation (APVG), which aims to synthesize the video of a person playing a certain instrument guided by a given music audio clip. It is a challenging task to...
详细信息
We propose a new task named Audio-driven Performance Video Generation (APVG), which aims to synthesize the video of a person playing a certain instrument guided by a given music audio clip. It is a challenging task to generate the high-dimensional temporal consistent videos from low-dimensional audio modality. In this paper, we propose a multi-staged framework to generate realistic and synchronized performance video from given music. Firstly, we provide both global appearance and local spatial information by generating the coarse videos and keypoints of body and hands from a given music respectively. Then, we propose to transform the generated keypoints to heatmap via a differentiable space transformer, since the heatmap provides more spatial information but is harder to generate directly from audio. Finally, we propose a Structured Temporal UNet (STU) to extract both intra-frame structured information and interframe temporal consistency. They are obtained via graph-based structure module, and CNN-GRU based high-level temporal module respectively for final video generation. Comprehensive experiments validate the effectiveness of our proposed framework.
暂无评论