Remote Photoplethysmography (rPPG) is a non-contact method that uses facial video to predict changes in blood volume, enabling physiological metrics measurement. Traditional rPPG models often struggle with poor genera...
详细信息
Integrated sensing and communication (ISAC) has been considered a key feature of next-generation wireless networks. This paper investigates the joint design of the radar receive filter and dual-functional transmit wav...
详细信息
Existing face forgery detection usually follows the paradigm of training models in a single domain, which leads to limited generalization capacity when unseen scenarios and unknown attacks occur. In this paper, we ela...
详细信息
Recently, deep multi-instance neural networks have been successfully applied for medical image classification, where only image-level labels rather than fine-grained patch-level labels are available for use. One key i...
详细信息
ISBN:
(数字)9781665468190
ISBN:
(纸本)9781665468206
Recently, deep multi-instance neural networks have been successfully applied for medical image classification, where only image-level labels rather than fine-grained patch-level labels are available for use. One key issue for these multi-instance neural networks is how to aggregate all patch (instance) features into an entire image (bag) representation, referred to as multi-instance pooling, e.g., max, mean, and attention based pooling. Nevertheless, these multi-instance pooling operations do not take the structural information within an image into account. This is obviously inappropriate for medical image classification since there often exist certain dependencies among regional patches/lesions. We propose an adaptive recurrent pooling based deep multi-instance neural network in this paper. In this network, we first extract multi-view global structural features from every bag using the self-attention mechanism, and then aggregate these multi-view features into a whole bag representation based on the adaptive recurrent pooling operation in order to further capture the contextual information within the bag. Moreover, we introduce the cross-normalization operation used in the Unit Force Operated vision Transformer into the self-attention module to reduce its computational complexity. We have experimentally evaluated the performance of the proposed network on three medical image datasets, namely UCSB breast, Messidor, and Colon cancer. The results demonstrate the advantage of our network over current state-of-the-art deep multi-instance networks in terms of classification accuracy and interpretability.
Multi-Instance Learning (MIL) is a weakly supervised learning paradigm, in which every training example is a labeled bag of unlabeled instances. In typical MIL applications, instances are often used for describing the...
详细信息
ISBN:
(数字)9781665468190
ISBN:
(纸本)9781665468206
Multi-Instance Learning (MIL) is a weakly supervised learning paradigm, in which every training example is a labeled bag of unlabeled instances. In typical MIL applications, instances are often used for describing the features of regions/parts in a whole object, e.g., regional patches/lesions in an eye-fundus image. However, for a (semantically) complex part the standard MIL formulation puts a heavy burden on the representation ability of the corresponding instance. To alleviate this pressure, we still adopt a bag-of-instances as an example in this paper, but extract from each instance a set of representations using $1 \times1$ convolutions. The advantages of this tactic are two-fold: i) This set of representations can be regarded as multi-view representations for an instance; ii) Compared to building multi-view representations directly from scratch, extracting them automatically using $1 \times1$ convolutions is more economical, and may be more effective since $1 \times1$ convolutions can be embedded into the whole network. Furthermore, we apply two consecutive multi-instance pooling operations on the reconstituted bag that has actually become a bag of sets of multi-view representations. We have conducted extensive experiments on several canonical MIL data sets from different application domains. The experimental results show that the proposed framework outperforms the standard MIL formulation in terms of classification performance and has good interpretability.
High-fidelity kinship face synthesis has many potential applications, such as kinship verification, missing child identification, and social media analysis. However, it is challenging to synthesize high-quality descen...
High-fidelity kinship face synthesis has many potential applications, such as kinship verification, missing child identification, and social media analysis. However, it is challenging to synthesize high-quality descendant faces with genetic relations due to the lack of large-scale, high-quality annotated kinship data. This paper proposes RFG (Region-level Facial Gene) extraction framework to address this issue. We propose to use IGE (Image-based Gene Encoder), LGE (Latent-based Gene Encoder) and Gene Decoder to learn the RFGs of a given face image, and the relationships between RFGs and the latent space of Style-GAN2. As cycle-like losses are designed to measure the $\mathcal{L}_{2}$ distances between the output of Gene Decoder and image encoder, and that between the output of LGE and IGE, only face images are required to train our framework, i.e. no paired kinship face data is required. Based upon the proposed RFGs, a crossover and mutation module is further designed to inherit the facial parts of parents. A Gene Pool has also been used to introduce the variations into the mutation of RFGs. The diversity of the faces of descendants can thus be significantly increased. Qualitative, quantitative, and subjective experiments on FIW, TSKinFace, and FF-Databases clearly show that the quality and diversity of kinship faces generated by our approach are much better than the existing state-of-the-art methods.
With the fast development of 3D imaginations becomes more and more fascination, multi-view stereo based 3D reconstruction is a significant technique for those application. To facilitate the subsequent processing of 3D...
详细信息
Internet of things are increasingly being deployed over the cloud (also referred to as cloud of things) to provide a broader range of services. However, there are serious challenges of CoT in the data protection and s...
详细信息
Depth estimation is an essential task for understanding the geometry of 3D scenes. Compared with multi-view-based methods, monocular depth estimation is more challenging for the requirement of integrating not only glo...
详细信息
Multimodal large language models (MLLMs) have demonstrated remarkable capabilities in various tasks. However, effectively evaluating these MLLMs on face perception remains largely unexplored. To address this gap, we i...
暂无评论