The rapid development of the Internet has led to the widespread dissemination of manipulated facial images, significantly impacting people's daily lives. With the continuous advancement of Deepfake technology, the...
详细信息
The rapid development of the Internet has led to the widespread dissemination of manipulated facial images, significantly impacting people's daily lives. With the continuous advancement of Deepfake technology, the generated counterfeit facial images have become increasingly challenging to distinguish. There is an urgent need for a more robust and convincing detection method. Current detection methods mainly operate in the spatial domain and transform the spatial domain into other domains for analysis. With the emergence of transformers, some researchers have also combined traditional convolutional networks with transformers for detection. This paper explores the artifacts left by Deepfakes in various domains and, based on this exploration, proposes a detection method that utilizes the steganalysis rich model to extract high-frequency noise to complement spatial features. We have designed two main modules to fully leverage the interaction between these two aspects based on traditional convolutional neural networks. The first is the multi-scale mixed feature attention module, which introduces artifacts from high-frequency noise into spatial textures, thereby enhancing the model's learning of spatial texture features. The second is the multi-scale channel attention module, which reduces the impact of background noise by weighting the features. Our proposed method was experimentally evaluated on mainstream datasets, and a significant amount of experimental results demonstrate the effectiveness of our approach in detecting Deepfake forged faces, outperforming the majority of existing methods.
Faced with the rapid development of social networks and the enormous business opportunities they contain, data mining and analysis based on social networks has become an inevitable trend. By utilizing various technolo...
详细信息
Since OpenAI opened access to ChatGPT,large language models(LLMs)become an increasingly popular topic attracting researchers’attention from abundant ***,public researchers meet some problems when developing LLMs give...
详细信息
Since OpenAI opened access to ChatGPT,large language models(LLMs)become an increasingly popular topic attracting researchers’attention from abundant ***,public researchers meet some problems when developing LLMs given that most of the LLMs are produced by industries and the training details are typically *** datasets are an important setup of LLMs,this paper does a holistic survey on the training datasets used in both the pre-train and fine-tune *** paper first summarizes 16 pre-train datasets and 16 fine-tune datasets used in the state-of-the-art ***,based on the properties of the pre-train and fine-tune processes,it comments on pre-train datasets from quality,quantity,and relation with models,and comments on fine-tune datasets from quality,quantity,and *** study then critically figures out the problems and research trends that exist in current LLM *** study helps public researchers train and investigate LLMs by visual cases and provides useful comments to the research community regarding data *** the best of our knowledge,this paper is the first to summarize and discuss datasets used in both autoregressive and chat *** survey offers insights and suggestions to researchers and LLM developers as they build their models,and contributes to the LLM study by pointing out the existing problems of LLM studies from the perspective of data.
In recent years, Wi-Fi sensing has garnered significant attention due to its numerous benefits, such as privacy protection, low cost, and penetration ability. Extensive research has been conducted in this field, focus...
详细信息
Recently, the multimodal large language model(MLLM) represented by GPT-4V has been a new rising research hotspot, which uses powerful large language models(LLMs) as a brain to perform multimodal tasks. The surprising ...
详细信息
Recently, the multimodal large language model(MLLM) represented by GPT-4V has been a new rising research hotspot, which uses powerful large language models(LLMs) as a brain to perform multimodal tasks. The surprising emergent capabilities of the MLLM, such as writing stories based on images and optical character recognition–free math reasoning, are rare in traditional multimodal methods, suggesting a potential path to artificial general intelligence. To this end, both academia and industry have endeavored to develop MLLMs that can compete with or even outperform GPT-4V, pushing the limit of research at a surprising speed. In this paper, we aim to trace and summarize the recent progress of MLLMs. First, we present the basic formulation of the MLLM and delineate its related concepts, including architecture,training strategy and data, as well as evaluation. Then, we introduce research topics about how MLLMs can be extended to support more granularity, modalities, languages and scenarios. We continue with multimodal hallucination and extended techniques, including multimodal in-context learning, multimodal chain of thought and LLM-aided visual reasoning. To conclude the paper, we discuss existing challenges and point out promising research directions.
In this paper,we address the problem of unsuperised social network embedding,which aims to embed network nodes,including node attributes,into a latent low dimensional *** recent methods,the fusion mechanism of node at...
详细信息
In this paper,we address the problem of unsuperised social network embedding,which aims to embed network nodes,including node attributes,into a latent low dimensional *** recent methods,the fusion mechanism of node attributes and network structure has been proposed for the problem and achieved impressive prediction ***,the non-linear property of node attributes and network structure is not efficiently fused in existing methods,which is potentially helpful in learning a better network *** this end,in this paper,we propose a novel model called ASM(Adaptive Specific Mapping)based on encoder-decoder *** encoder,we use the kernel mapping to capture the non-linear property of both node attributes and network *** particular,we adopt two feature mapping functions,namely an untrainable function for node attributes and a trainable function for network *** the mapping functions,we obtain the low dimensional feature vectors for node attributes and network structure,***,we design an attention layer to combine the learning of both feature vectors and adaptively learn the node *** encoder,we adopt the component of reconstruction for the training process of learning node attributes and network *** conducted a set of experiments on seven real-world social network *** experimental results verify the effectiveness and efficiency of our method in comparison with state-of-the-art baselines.
Free-viewpoint video allows the user to view objects from any virtual perspective,creating an immersive visual *** technology enhances the interactivity and freedom of multimedia ***,many free-viewpoint video synthesi...
详细信息
Free-viewpoint video allows the user to view objects from any virtual perspective,creating an immersive visual *** technology enhances the interactivity and freedom of multimedia ***,many free-viewpoint video synthesis methods hardly satisfy the requirement to work in real time with high precision,particularly for sports fields having large areas and numerous moving *** address these issues,we propose a freeviewpoint video synthesis method based on distance field *** central idea is to fuse multiview distance field information and use it to adjust the search step size *** step size search is used in two ways:for fast estimation of multiobject three-dimensional surfaces,and synthetic view rendering based on global occlusion *** have implemented our ideas using parallel computing for interactive display,using CUDA and OpenGL frameworks,and have used real-world and simulated experimental datasets for *** results show that the proposed method can render free-viewpoint videos with multiple objects on large sports fields at 25 ***,the visual quality of our synthetic novel viewpoint images exceeds that of state-of-the-art neural-rendering-based methods.
Insect fine-grained image classification is an application scenario in fine-grained image classification. It not only has the characteristics of small inter-class differences and large intra-class differences, but als...
详细信息
With the continuous development of Internet technology, using multimedia for virtual application has become a new way. In this paper, by introducing the bayesian classification algorithm, multimedia graphics and video...
详细信息
Backdoor attacks involve the injection of a limited quantity of poisoned samples containing triggers into the training dataset. During the inference stage, backdoor attacks can uphold a high level of accuracy for norm...
详细信息
暂无评论