Human-computer interaction (HCI) is a multidisciplinary field of study focusing on the design of computer technology and, in particular, the interactions between humans and computers. Public space, between the urban b...
详细信息
ISBN:
(纸本)9781665401913
Human-computer interaction (HCI) is a multidisciplinary field of study focusing on the design of computer technology and, in particular, the interactions between humans and computers. Public space, between the urban buildings, is an open and accessible area to people. Public life, happening in public spaces, is about human activity, human interaction, expression of human feeling in the wild. Affective behavior analysis in the public space is the basic topic of the public life research, which is the key to achieve HCI applications through comprehensively understanding people's feelings, emotions, social behaviors and their correlations in a 'human-centered' and engaging manner. However, it is a challenging task to design a robust HCI system due to the lack of multi-task datasets (including emotion, behavior, social relations, etc), collected under the uncontrolled conditions in real public spaces. In spite that existing separate datasets in computervision can somehow meet the requirement of public life research, they are neither captured from real public spaces nor for multiple tasks, which cannot comprehensively support the joint research of public life. To tackle this issue, this paper presents a multi-task, multi-group human-oriented video dataset, namely public life in public space (PLPS). Specifically, multi-tasks in terms of activity recognition, emotion recognition and social relation recognition are integrated for each video data. Multi-group and multi-level labels in terms of individuals, groups, video clips are included in the dataset. With PLPS, more sophisticated computervision model for comprehensive public life research can be facilitated.
In recent years, traffic surveillance systems have begun leveraging fisheye lenses to minimize the requisite number of cameras for comprehensive coverage of streets and intersections. However, as fisheye images have l...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
In recent years, traffic surveillance systems have begun leveraging fisheye lenses to minimize the requisite number of cameras for comprehensive coverage of streets and intersections. However, as fisheye images have large radial distortion, they pose new challenges to standard object detection algorithms. In this study, we propose a robust object detection method in traffic scenarios using fisheye cameras. Specifically, we develop a novel data augmentation method, which is applied to VisDrone dataset. Note that we select this dataset for augmentation, since it bears resemblances to the Fisheye8K dataset. Furthermore, we leverage pseudo labels generated by a pre-trained object detection model based on the Fisheye8K and original VisDrone dataset to further enrich the training data. Finally, we utilize various state-of-the-art object detection models trained with different combinations of the proposed augmented data, which are then combined with robust ensemble techniques to further enhance the overall object detection performance. As a result, our proposed method achieves a final F1 score of 64.06% on the 2024 AI City Challenge - Track 4 and ranks first among the competing teams.
We present HoHoNet, a versatile and efficient framework for holistic understanding of an indoor 360-degree panorama using a Latent Horizontal Feature (LHFeat). The compact LHFeat flattens the features along the vertic...
详细信息
ISBN:
(纸本)9781665445092
We present HoHoNet, a versatile and efficient framework for holistic understanding of an indoor 360-degree panorama using a Latent Horizontal Feature (LHFeat). The compact LHFeat flattens the features along the vertical direction and has shown success in modeling per-column modality for room layout reconstruction. HoHoNet advances in two important aspects. First, the deep architecture is redesigned to run faster with improved accuracy. Second, we propose a novel horizon-to-dense module, which relaxes the per-column output shape constraint, allowing per-pixel dense prediction from LHFeat. HoHoNet is fast: It runs at 52 FPS and 110 FPS with ResNet-50 and ResNet-34 backbones respectively, for modeling dense modalities from a high-resolution 512 x 1024 panorama. HoHoNet is also accurate. On the tasks of layout estimation and semantic segmentation, HoHoNet achieves results on par with current state-of-the-art. On dense depth estimation, HoHoNet outperforms all the prior arts by a large margin.
Neural implicit functions have emerged as a powerful representation for surfaces in 3D. Such a function can encode a high quality surface with intricate details into the parameters of a deep neural network. However, o...
详细信息
ISBN:
(数字)9781665445092
ISBN:
(纸本)9781665445092
Neural implicit functions have emerged as a powerful representation for surfaces in 3D. Such a function can encode a high quality surface with intricate details into the parameters of a deep neural network. However, optimizing for the parameters for accurate and robust reconstructions remains a challenge, especially when the input data is noisy or incomplete. In this work, we develop a hybrid neural surface representation that allows us to impose geometry-aware sampling and regularization, which significantly improves the fidelity of reconstructions. We propose to use iso-points as an explicit representation for a neural implicit function. These points are computed and updated on-the-fly during training to capture important geometric features and impose geometric constraints on the optimization. We demonstrate that our method can be adopted to improve state-of-the-art techniques for reconstructing neural implicit surfaces from multi-view images or point clouds. Quantitative and qualitative evaluations show that, compared with existing sampling and optimization methods, our approach allows faster convergence, better generalization, and accurate recovery of details and topology.
Breast cancer is the second most prevalent form of cancer and is the “leading cause of most cancer-related deaths in women”. Most women living in low- and middle-income countries (LMIC) have limited access to the ex...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Breast cancer is the second most prevalent form of cancer and is the “leading cause of most cancer-related deaths in women”. Most women living in low- and middle-income countries (LMIC) have limited access to the existing poor health systems, restricted access to treatment facilities, and in general lack of breast cancer screening programmes. The likelihood of women living in LMIC attending a health facility with advanced-stage breast cancer is very high and the chances of them being able to afford treatment at that stage, even if the treatment is available, is very low. In this work, we evaluate the capabilities of deep learning as a classification tool with the aim of detecting cancerous ultrasound breast images. We aim to deploy a simple classifier on a mobile device with an inexpensive handheld ultrasound imaging system to pick up breast cancer cases that will need medical attention. We demonstrate in this work that with minimal ultrasound images, a de novo system trained from scratch can achieve accuracy of close to 64% and about 78% when the same model is pre-trained.
The recently introduced introspective variational autoencoder (IntroVAE) exhibits outstanding image generations, and allows for amortized inference using an image encoder. The main idea in IntroVAE is to train a VAE a...
详细信息
ISBN:
(纸本)9781665445092
The recently introduced introspective variational autoencoder (IntroVAE) exhibits outstanding image generations, and allows for amortized inference using an image encoder. The main idea in IntroVAE is to train a VAE adversarially, using the VAE encoder to discriminate between generated and real data samples. However, the original IntroVAE loss function relied on a particular hinge-loss formulation that is very hard to stabilize in practice, and its theoretical convergence analysis ignored important terms in the loss. In this work, we take a step towards better understanding of the IntroVAE model, its practical implementation, and its applications. We propose the Soft-IntroVAE, a modified IntroVAE that replaces the hinge-loss terms with a smooth exponential loss on generated samples. This change significantly improves training stability, and also enables theoretical analysis of the complete algorithm. Interestingly, we show that the IntroVAE converges to a distribution that minimizes a sum of KL distance from the data distribution and an entropy term. We discuss the implications of this result, and demonstrate that it induces competitive image generation and reconstruction. Finally, we describe an application of Soft-IntroVAE to unsupervised image translation, and demonstrate compelling results. Code and additional information is available on the project website - ***/soft-intro-vae-web.
Despite having completely different configurations, deep learning architectures learn a specific set of features that are common across architectures. For example, the initial few layers learn the low-level edge featu...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Despite having completely different configurations, deep learning architectures learn a specific set of features that are common across architectures. For example, the initial few layers learn the low-level edge features from the images. Based on this fact, in this research, we have showcased the potential of deep neural network fusion for simple and effective deepfake detection. The advantage of building an architecture in such a manner is to build a low-power-consuming and accurate defense that can be deployed on mobile devices. To utilize the pre-trained knowledge and obtain downstream task-specific knowledge, we have identified a breakpoint in different networks and divided the obtained knowledge of a network into fixed and adaptive information. We have kept the fixed knowledge intact while modifying the adaptive knowledge along with entirely new knowledge for the deepfake detection task. In the end, the decision of multiple deep architectures trained based on their breakpoint are combined for improved performance. Extensive comparisons performed with existing state-of-the-art architectures demonstrate the effectiveness of the proposed deepfake detection algorithm. The proposed algorithm not only surpasses the existing state-of-the-art (SOTA) algorithms but also needs low computational power. We have further challenged the proposed algorithm by evaluating it by collecting real-world deepfake images.
Face recognition has achieved a great success in recent years, it is still challenging to recognize those facial images with extreme poses. Traditional methods consider it as a domain gap problem. Many of them settle ...
详细信息
ISBN:
(纸本)9781665445092
Face recognition has achieved a great success in recent years, it is still challenging to recognize those facial images with extreme poses. Traditional methods consider it as a domain gap problem. Many of them settle it by generating fake frontal faces from extreme ones, whereas they are tough to maintain the identity information with high computational consumption and uncontrolled disturbances. Our experimental analysis shows a dramatic precision drop with extreme poses. Meanwhile, those extreme poses just exist minor visual differences after small rotations. Derived from this insight, we attempt to relieve such a huge precision drop by making minor changes to the input images without modifying existing discriminators. A novel lightweight pseudo facial generation is proposed to relieve the problem of extreme poses without generating any frontal facial image. It can depict the facial contour information and make appropriate modifications to preserve the critical identity information. Specifically, the proposed method reconstructs pseudo profile faces by minimizing the pixel-wise differences with original profile faces and maintaining the identity consistent information from their corresponding frontal faces simultaneously. The proposed framework can improve existing discriminators and obtain a great promotion on several benchmark datasets.
Facial brightness is a key image quality factor impacting face recognition accuracy differentials across demographic groups. In this work, we aim to decrease the accuracy gap between the similarity score distributions...
详细信息
ISBN:
(数字)9798331536626
ISBN:
(纸本)9798331536633
Facial brightness is a key image quality factor impacting face recognition accuracy differentials across demographic groups. In this work, we aim to decrease the accuracy gap between the similarity score distributions for Caucasian and African American female mated image pairs, as measured by d’ between distributions. To balance brightness across demographic groups, we conduct three experiments, interpreting brightness in the face skin region either as median pixel value or as the distribution of pixel values. Balancing based on median brightness alone yields up to a 46.8% decrease in d’, while balancing based on brightness distribution yields up to a 57.6% decrease. In all three cases, the similarity scores of the individual distributions improve, with mean scores maximally improving 5.9% for Caucasian females and 3.7% for African American females.
In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared inform...
详细信息
ISBN:
(纸本)9781665445092
In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRI, mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (MN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.
暂无评论