The objective of Multimodal Knowledge Graph Completion (MKGC) is to forecast absent entities within a knowledge graph by leveraging additional textual and visual modalities. Existing studies commonly utilize a singula...
The objective of Multimodal Knowledge Graph Completion (MKGC) is to forecast absent entities within a knowledge graph by leveraging additional textual and visual modalities. Existing studies commonly utilize a singular relationship embedding to depict all modalities within an entity pair, thus connecting several relationships derived from diverse modalities. However, this coupling may introduce interference from conflicting information between modalities, as the relationships between modalities for a given entity pair can be contradictory. Moreover, existing Ensemble Inference methods fail to dynamically adjust modal weights based on their differences and importance, despite the varying contributions of different modalities. In this paper, we propose the Multimodal Decouple and Relation-based Ensemble inference (MDRE) model. For each modality, we construct corresponding relationship embeddings and build separate triple representations to avoid interferences among modalities. During the training phase, we employ confidence-constrained training with temperature scaling to alleviate conflicting information in textual and visual modalities. For inference, we utilize the Relation-based Ensemble Inference method to adjust modal weights at the relationship level, thus achieving improved prediction results. Experimental results on two datasets demonstrate that MDRE outperforms existing single-modal and multimodal knowledge graph completion methods in terms of performance.
To enhance the accuracy of aluminum-silicon alloy microstructure image classification, we propose an aluminum-silicon alloy microstructure image classification model based on an improved residual network, incorporatin...
To enhance the accuracy of aluminum-silicon alloy microstructure image classification, we propose an aluminum-silicon alloy microstructure image classification model based on an improved residual network, incorporating attention mechanisms and multi-scale thinking. The model is built upon the ResNet18 architecture, utilizing enhanced basic residual blocks to select informative features. Additionally, a multi-scale attention module is incorporated into the network to fuse multi-scale features and further extract effective features. Experimental results on an aluminum-silicon alloy microstructure image classification dataset demonstrate that compared to the best-performing model currently available, the proposed model achieves an increase in classification accuracy, precision, recall, and F1 score by 3.17%, 0.74%, 5.53%, and 3.26%, respectively, reaching 94.97%, 96.39%, 93.97%, and 95.17%. These results indicate that the proposed model exhibits strong capabilities in classifying aluminum-silicon alloy microstructure images and can facilitate further research on the properties of aluminum-silicon alloys.
Recently, fake news detection on social media (SM) has attracted a lot of attention. With the emergence of fake news at a breakneck pace, the massive spread of fake news has had a serious impact in our society. The au...
Recently, fake news detection on social media (SM) has attracted a lot of attention. With the emergence of fake news at a breakneck pace, the massive spread of fake news has had a serious impact in our society. The authenticity of the news is questionable and there exists a necessity for an automated tool for the detection. However, most fake news detection methods are mainly supervised, requiring huge amounts of annotated data, which is time-consuming, expensive, and almost impossible with vast new SM volume. To deal with this problem, in this paper, we propose a novel unsupervised fake news detection framework based on structural contrastive learning by combining the propagation structure of news and contrastive learning to achieve unsupervised training. To validate the influence of parameters and our method’s performance, we design experiment sets on public Twitter and Weibo datasets, which validate our approach outperforms current baseline ones and has proper robustness.
In a real scenario, the image is often corrupted by complex degradation, and a lot of useful information is lost, which makes super-resolution (SR) reconstruction seriously ill-posed. To effectively solve such a probl...
In a real scenario, the image is often corrupted by complex degradation, and a lot of useful information is lost, which makes super-resolution (SR) reconstruction seriously ill-posed. To effectively solve such a problem, it is crucial to correctly exploit image prior knowledge. Although existing deep learning-based methods can obtain excellent results, they cannot deal with the complex degradation effectively, which would lead to the loss of texture details and the destruction of edge details. In this paper, an efficient multi-regularization method for SR is proposed, which can simultaneously exploit both internal and external image priors within a unified framework. The hybrid Tikhonov-TV prior and deep denoiser prior are introduced to constrain the reconstruction process. That is, the proposed model combines the superiority of the piecewise-smooth prior and deep prior. Moreover, an adaptive weight parameter is employed to make the hybrid component more detail-preserving. Experimental demonstrate that the proposed method achieves better performance in image detail protection than advanced methods.
Masked Language Modeling (MLM) and Image-Text Matching (ITM) are always used in fusion encoder to learn the joint representation of images and text. In existing methods, the masking strategy of MLM leads to the neglec...
Masked Language Modeling (MLM) and Image-Text Matching (ITM) are always used in fusion encoder to learn the joint representation of images and text. In existing methods, the masking strategy of MLM leads to the neglect of image details during the modeling process. Meanwhile, the sampling strategy of ITM struggles to consistently select high-difficulty hard negative instances, reducing the effectiveness of constraints. This leads to challenges in aligning fine-grained information in cross-modal retrieval. In response to this challenge, a fine-grained information alignment-based visual language model (FAM) is proposed in this paper. On one hand, the attribute-based masking strategy is employed in MLM, helping the model focus on the details of objects in images during modeling. On the other hand, the robust hard negative sample generation strategy provides challenging negative samples for ITM by altering the relationships between objects. This enables the model to align relationships between objects in different modalities and thus calibrates cross-modal retrieval. Extensive experiments demonstrate the effectiveness of the model in cross-modal retrieval tasks.
Image inpainting, which aims to reconstruct reasonably clear and realistic images from known pixel information, is one of the core problems in computervision. However, due to the complexity and variability of the und...
Image inpainting, which aims to reconstruct reasonably clear and realistic images from known pixel information, is one of the core problems in computervision. However, due to the complexity and variability of the underwater environment, the inability to extract valid pixel points and insufficient correlation between feature information in existing image inpainting techniques lead to blurring in the generated images. Therefore, a novel gated attention feature fusion image inpainting network based on generative adversarial networks (GAF-GAN) is proposed. The accuracy of feature similarity matching depends heavily on the validity of the information contained in the features. On the one hand, gating values are dynamically generated by gated convolution to reduce the interference of invalid information. On the other hand, semantic information at distant locations in an image is accurately acquired by the attention mechanism. For these reasons, we designed an improved gated attention mechanism. Gated attention mechanism make the network focus on effective information such as high-frequency texture and color fidelity of restored images. In addition, the dense feature fusion module is added to expand the overall receptive field of the network to fully learn the image features. Experimental results show that the proposed method can effectively repair defective images with complex texture structures and improve the reality and integrity of image details and structures.
Underwater images are often affected by problems such as light attenuation, color distortion, noise and scattering, resulting in image defects. A novel image inpainting method is proposed to intelligently predict and ...
Underwater images are often affected by problems such as light attenuation, color distortion, noise and scattering, resulting in image defects. A novel image inpainting method is proposed to intelligently predict and fill damaged areas for complete and continuous visualization of the image. First, in order to effectively solve the problem of color distortion caused by light refraction in underwater environments, the improved gated attention mechanism is used. This mechanism improves the local details by learning and weighting the important features of the image. Second, gated convolution automatically determines the degree of restoration for each pixel based on local features of the original image. It eliminates distractions such as low contrast and scattering, retaining more original detailed information. By doing so, image inpainting techniques improve the quality and visualization of underwater images.
In the burgeoning field of autonomous vehicles (AVs), trajectory prediction remains a formidable challenge, especially in mixed autonomy environments. Traditional approaches often rely on computational methods such as...
详细信息
The development of the Internet has made people more closely related and has put forward higher requirements for recommendation models. Most recommendation models are studied only for the long-term interests of users....
The development of the Internet has made people more closely related and has put forward higher requirements for recommendation models. Most recommendation models are studied only for the long-term interests of users. In this paper, the interaction time between the user and the item is introduced as auxiliary information in the model construction. Interaction time is used to determine users’ long-term preferences and short-term preferences. In this paper, temporal features are extracted by building a convolutional gated recurrent unit with attention neural network (CNN-GRU-Attention). Firstly, for the problem of accurate feature extraction, CNN are constructed to extract higher-level and more abstract features of themselves and transform high-dimensional data into low-dimensional data; secondly, for the problem of social temporality, GRU are used to not only extract temporal information, but also effectively reduce gradient dispersion, making model convergence and training easier; finally, Graph Attention networks are used to aggregate the social relationship information of users and items respectively, which constitute the final feature representation of users and items respectively. In particular, a modified cosine similarity is used to reduce the error caused by data insensitivity when constructing the social information of the item. In this study, simulation experiments are conducted on two publicly available datasets (Epinions and Ciao), and the experimental results show that the proposed recommended model performs better than other social recommendation models, improving the evaluation metrics of MAE and RMSE by 1.06%-1.33% and 1.19%-1.37%, respectively. The effectiveness of the model innovation is proved.
In many underwater application scenarios, recognition tasks need to be executed promptly on computationally limited platforms. However, models designed for this field often exhibit spatial locality, and existing works...
In many underwater application scenarios, recognition tasks need to be executed promptly on computationally limited platforms. However, models designed for this field often exhibit spatial locality, and existing works lack the ability to capture crucial details in images. Therefore, a lightweight and detail-aware vision network (LDVNet) for resource-constrained environments is proposed to overcome the limitations of these approaches. Firstly, in order to enhance the accuracy of target image recognition, we introduce transformer modules to acquire global information, thus addressing the issue of spatial locality inherent in traditional convolutional neural networks (CNNs). Secondly, to maintain the network’s lightweight nature, we integrate the transformer module with convolutional operations, thereby mitigating the substantial parameter and floating point operations (FLOPs) overhead. Thirdly, for the efficient extraction of crucial fine-grained details from feature maps, we have devised a channel and spatial attention module (C&SA). This module aids in recognizing intricate and fine-grained visual tasks and enhances image understanding. It is seamlessly integrated into LDVNet with nearly negligible parameter overhead. The experimental results demonstrate that LDVNet outperforms other lightweight networks and hybrid networks in different recognition tasks, while being suitable for resource-constrained environments.
暂无评论