With the rapid development of drone technology, object detection technology is widely applied in the field of drone intelligent transportation. In response to the problems of low accuracy and high computational comple...
详细信息
This paper investigates a low-light image enhancement method based on the diffusion bridge framework. Currently, low-light image enhancement tasks still face challenges in noise reduction and detail restoration, and e...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
This paper investigates a low-light image enhancement method based on the diffusion bridge framework. Currently, low-light image enhancement tasks still face challenges in noise reduction and detail restoration, and existing diffusion model methods are time-consuming and have unstable diffusion processes. We conducted an in-depth study of diffusion bridge theory, integrating the advantages of the end-to-end paradigm of diffusion bridge theory and employing a nonlinear activation network to further enhance the performance of low-light image enhancement tasks. Additionally, we used a Gamma correction module for fine-tuning low-light images, significantly improving performance with almost no extra computational cost. Experiments show that Low-Light image enhancement with Difffusion Bridge (LLDB) far surpasses other methods on LOLv1 and LOLv2 datasets. Code is available at https://***/M-Chase/LLDB
The growing demand for high-quality 3D human rendering in real-world applications highlights significant challenges. These challenges are particularly evident in dealing with occlusion in monocular video. Previous met...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
The growing demand for high-quality 3D human rendering in real-world applications highlights significant challenges. These challenges are particularly evident in dealing with occlusion in monocular video. Previous methods often rely on controlled datasets and overlook the inherent symmetry of the human body, leading to incomplete rendering in occluded areas. To address these limitations, we propose SymGaussian, a novel Gaussian Splatting-based approach for rendering occluded human from monocular video. We introduce Multi-scale Symmetry Feature to compensate for lost information in occluded areas, along with a Projective Texture Mapping method that efficiently encodes 2D appearance while preserving 3D perception. Experiments show that SymGaussian outperforms state-of-the-art methods in rendering quality, while achieving rapid training speed and real-time rendering capability exceeding 200 FPS.
Deep learning has strongly promoted the research progress of unimodal feature learning such as face recognition and speech recognition, but it still needs to be strengthened in multimodal data representation learning,...
详细信息
Recent video editing advancements rely on accurate pose sequences to animate human actors. However, these efforts are not suitable for cross-species animation due to pose misalignment between species (for example, the...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Recent video editing advancements rely on accurate pose sequences to animate human actors. However, these efforts are not suitable for cross-species animation due to pose misalignment between species (for example, the poses of a cat differ greatly from that of a pig due to their distinct body structures). In this paper, we present Anima 2 , a zero-shot diffusion-based video generator to address this issue, aiming to accurately ANIMAte ANIMAls while preserving the background. The key technique involves two-fold subject alignment. First, we improve appearance feature extraction by integrating a Laplacian detail booster and a prompt-tuning identity extractor. They capture essential appearance information, including identity and fine details. Second, we align shape features and address conflicts from differing animals by introducing a scale-information remover and an adaptive rescaling module. They both enhance subject alignment for accurate cross-species animation. Additionally, we introduce two high-quality animal video datasets with diverse species to benchmark cross-species animation. Trained on these extensive datasets, our model directly generates videos with accurate movements, consistent appearances, and high-fidelity frames, eliminating the need for test-time training. Extensive experiments demonstrate our method’s superiority in cross-species animation, showcasing robust adaptability and generality.
User-generated content (UGC) has become more and more popular on the web and the published review is an essential type of UGC. Nevertheless, the explosion of reviews brings a problem of severe information overload. Th...
详细信息
Many existing knowledge graph embedding methods learn semantic representations for entities by using graph neural networks (GNN) to harvest their intrinsic relevances. However, these methods mostly represent every ent...
详细信息
image restoration refers to the process of restoring a damaged low-quality image back to its corresponding high-quality image. Recently, a special type of diffusion bridge model has achieved more advanced results in i...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
image restoration refers to the process of restoring a damaged low-quality image back to its corresponding high-quality image. Recently, a special type of diffusion bridge model has achieved more advanced results in image restoration. It can transform the direct mapping from low-quality to high-quality images into a diffusion process, restoring low-quality images through a reverse process. However, the current diffusion bridge restoration models do not emphasize the idea of conditional control, which may affect performance. This paper introduces the ECDB model enhancing the control of the diffusion bridge with low-quality images as conditions. Moreover, in response to the characteristic of diffusion models having low denoising level at larger values of t, we also propose a Conditional Fusion Schedule, which more effectively handles the conditional feature information of various modules. Experimental results prove that the ECDB model has achieved state-of-the-art results in many image restoration tasks, including deraining, inpainting and super-resolution. Code is avaliable at https://***/Hammour-steak/ECDB.
Visual Question Answering is a very challenging tasks in the fiield of AI. The existing model is not accurate enough to judge the relationship between problem words, image objects and the association between problem a...
详细信息
ISBN:
(纸本)9798400707674
Visual Question Answering is a very challenging tasks in the fiield of AI. The existing model is not accurate enough to judge the relationship between problem words, image objects and the association between problem and image objects, meanwhile, the interpretability of the model needs to be improved. In response, this paper proposes a visual question answering model based on relationship graph and co-attention. Relational features can be obtained by constructing a relational graph using the relationships between the problem and the image, and between the objects of the image, and by dynamically updating the relational graph using a graph attention network guided by the problem. Then, the obtained features are fused using co- attention to obtain richer ***, these features are used to make answer prediction. The model is experimented on the VQA2.0 dataset, and the experimental results show that the model improved the accuracy of the visual question answering model to some extent compared with the existing models.
In recent years, with the improvement of hardware computing power and the rapid development of deep learning theory, the field of dialogue generation has also entered the era of deep learning. However, because it is d...
详细信息
ISBN:
(纸本)9798400707674
In recent years, with the improvement of hardware computing power and the rapid development of deep learning theory, the field of dialogue generation has also entered the era of deep learning. However, because it is difficult to generate content-rich and recovery statements by learning semantics only from dialogue corpus, many models are prone to generate responses that are weakly scalable to the topic of the input dialogue or generic responses that lack knowledge information. To address the above issues, a Heterogeneous Graph Neural Network Incorporating External Knowledge Dialogue Generation Model (HEDG) is proposed. First, we build up the external knowledge into a knowledge graph, and find the relevant external knowledge in the knowledge graph according to the entities in the input conversation according to certain rules. This external knowledge and the historical conversation context are then used as inputs to the heterogeneous graph encoder, and the conversation content and the corresponding external knowledge are represented by a Heterogeneous Graph Neural Network (HGNN). Finally, a decoder with Transformer structure as the main body is used to generate a response that is relevant to the conversation context with some knowledge extension by taking the encoded heterogeneous graphical representation as input. The experimental results show that the model has higher relevance and diversity compared with existing comparison models, and can effectively integrate external knowledge.
暂无评论