While deep learning techniques have shown promising performance in the Major Depressive Disorder (MDD) detection task, they still face limitations in real-world scenarios. Specifically, given the data scarcity, some e...
详细信息
Person re-identification is a prevalent technology deployed on intelligent *** have been remarkable achievements in person re-identification methods based on the assumption that all person images have a sufficiently h...
详细信息
Person re-identification is a prevalent technology deployed on intelligent *** have been remarkable achievements in person re-identification methods based on the assumption that all person images have a sufficiently high resolution,yet such models are not applicable to the open *** real world,the changing distance between pedestrians and the camera renders the resolution of pedestrians captured by the camera *** low-resolution(LR)images in the query set are matched with high-resolution(HR)images in the gallery set,it degrades the performance of the pedestrian matching task due to the absent pedestrian critical information in LR *** address the above issues,we present a dualstream coupling network with wavelet transform(DSCWT)for the cross-resolution person re-identification ***,we use the multi-resolution analysis principle of wavelet transform to separately process the low-frequency and high-frequency regions of LR images,which is applied to restore the lost detail information of LR ***,we devise a residual knowledge constrained loss function that transfers knowledge between the two streams of LR images and HR images for accessing pedestrian invariant features at various *** qualitative and quantitative experiments across four benchmark datasets verify the superiority of the proposed approach.
Text-to-image synthesis refers to generating visual-realistic and semantically consistent images from given textual descriptions. Previous approaches generate an initial low-resolution image and then refine it to be h...
详细信息
Text-to-image synthesis refers to generating visual-realistic and semantically consistent images from given textual descriptions. Previous approaches generate an initial low-resolution image and then refine it to be high-resolution. Despite the remarkable progress, these methods are limited in fully utilizing the given texts and could generate text-mismatched images, especially when the text description is complex. We propose a novel finegrained text-image fusion based generative adversarial networks(FF-GAN), which consists of two modules: Finegrained text-image fusion block(FF-Block) and global semantic refinement(GSR). The proposed FF-Block integrates an attention block and several convolution layers to effectively fuse the fine-grained word-context features into the corresponding visual features, in which the text information is fully used to refine the initial image with more details. And the GSR is proposed to improve the global semantic consistency between linguistic and visual features during the refinement process. Extensive experiments on CUB-200 and COCO datasets demonstrate the superiority of FF-GAN over other state-of-the-art approaches in generating images with semantic consistency to the given texts.
Domain adaptation aims to transfer knowledge from the labeled source domain to an unlabeled target domain that follows a similar but different ***,adversarial-based methods have achieved remarkable success due to the ...
详细信息
Domain adaptation aims to transfer knowledge from the labeled source domain to an unlabeled target domain that follows a similar but different ***,adversarial-based methods have achieved remarkable success due to the excellent performance of domain-invariant feature presentation ***,the adversarial methods learn the transferability at the expense of the discriminability in feature representation,leading to low generalization to the target *** this end,we propose a Multi-view Feature Learning method for the Over-penalty in Adversarial Domain ***,multi-view representation learning is proposed to enrich the discriminative information contained in domain-invariant feature representation,which will counter the over-penalty for discriminability in adversarial ***,the class distribution in the intra-domain is proposed to replace that in the inter-domain to capture more discriminative information in the learning of transferrable *** experiments show that our method can improve the discriminability while maintaining transferability and exceeds the most advanced methods in the domain adaptation benchmark datasets.
data partitioning techniques are pivotal for optimal data placement across storage devices,thereby enhancing resource utilization and overall system ***,the design of effective partition schemes faces multiple challen...
详细信息
data partitioning techniques are pivotal for optimal data placement across storage devices,thereby enhancing resource utilization and overall system ***,the design of effective partition schemes faces multiple challenges,including considerations of the cluster environment,storage device characteristics,optimization objectives,and the balance between partition quality and computational ***,dynamic environments necessitate robust partition detection *** paper presents a comprehensive survey structured around partition deployment environments,outlining the distinguishing features and applicability of various partitioning strategies while delving into how these challenges are *** discuss partitioning features pertaining to database schema,table data,workload,and runtime *** then delve into the partition generation process,segmenting it into initialization and optimization stages.A comparative analysis of partition generation and update algorithms is provided,emphasizing their suitability for different scenarios and optimization ***,we illustrate the applications of partitioning in prevalent database products and suggest potential future research directions and *** survey aims to foster the implementation,deployment,and updating of high-quality partitions for specific system scenarios.
The video grounding(VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in the complex interaction between video ...
详细信息
The video grounding(VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in the complex interaction between video and query, overemphasizing cross-modal feature fusion and feature correlation for VG. In this paper, we propose a novel boundary regression paradigm that performs regression token learning in a transformer. Particularly, we present a simple but effective proposal-free framework, namely video grounding transformer(ViGT), which predicts the temporal boundary using a learnable regression token rather than multi-modal or cross-modal features. In ViGT, the benefits of a learnable token are manifested as follows.(1) The token is unrelated to the video or the query and avoids data bias toward the original video and query.(2) The token simultaneously performs global context aggregation from video and query ***, we employed a sharing feature encoder to project both video and query into a joint feature space before performing cross-modal co-attention(i.e., video-to-query attention and query-to-video attention) to highlight discriminative features in each modality. Furthermore, we concatenated a learnable regression token [REG] with the video and query features as the input of a vision-language transformer. Finally, we utilized the token [REG] to predict the target moment and visual features to constrain the foreground and background probabilities at each timestamp. The proposed ViGT performed well on three public datasets:ANet-Captions, TACoS, and YouCookⅡ. Extensive ablation studies and qualitative analysis further validated the interpretability of ViGT.
Nowadays,the personalized recommendation has become a research hotspot for addressing information *** this,generating effective recommendations from sparse data remains a ***,auxiliary information has been widely used...
详细信息
Nowadays,the personalized recommendation has become a research hotspot for addressing information *** this,generating effective recommendations from sparse data remains a ***,auxiliary information has been widely used to address data sparsity,but most models using auxiliary information are linear and have limited *** to the advantages of feature extraction and no-label requirements,autoencoder-based methods have become quite ***,most existing autoencoder-based methods discard the reconstruction of auxiliary information,which poses huge challenges for better representation learning and model *** address these problems,we propose Serial-Autoencoder for Personalized Recommendation(SAPR),which aims to reduce the loss of critical information and enhance the learning of feature ***,we first combine the original rating matrix and item attribute features and feed them into the first autoencoder for generating a higher-level representation of the ***,we use a second autoencoder to enhance the reconstruction of the data representation of the prediciton rating *** output rating information is used for recommendation *** experiments on the MovieTweetings and MovieLens datasets have verified the effectiveness of SAPR compared to state-of-the-art models.
Domain adaptation aims to transfer knowledge between different domains to develop an effective hypothesis in the target domain with scarce labeled data, which is an effective method for remedying the problem of labele...
详细信息
On-site lithium-ion battery state of health (SoH) estimation is of crucial importance for reliable operations of electric vehicles (EVs). Yet, due to the low-quality of unlabeled real-time field data, diverse operatin...
详细信息
Graph pattern matching is a technique widely used in various fields such as protein structure analysis, social group querying, and expert localization. This technique involves finding matching subgraphs in large socia...
详细信息
暂无评论