This paper aims to explore the application and core position of vision and generation algorithm in digital media art. Through deep learning and computervision technology, these algorithms not only process image and v...
详细信息
We propose a quantum-weighted autoencoder network for compression computer-generated holograms. And the quantum-weighted autoencoder consists of embedding, entanglement, and measurement layers. Experimental results sh...
详细信息
This edited volume contains technical contributions in the field of computervision and imageprocessing presented at the First internationalconference on computervision and imageprocessing (cvip 2016). The contrib...
ISBN:
(数字)9789811021046
ISBN:
(纸本)9789811021039;9789811021046
This edited volume contains technical contributions in the field of computervision and imageprocessing presented at the First internationalconference on computervision and imageprocessing (cvip 2016). The contributions are thematically divided based on their relation to operations at the lower, middle and higher levels of vision systems, and their applications. The technical contributions in the areas of sensors, acquisition, visualization and enhancement are classified as related to low-level operations. They discuss various modern topics reconfigurable image system architecture, Scheimpflug camera calibration, real-time autofocusing, climate visualization, tone mapping, super-resolution and image resizing. The technical contributions in the areas of segmentation and retrieval are classified as related to mid-level operations. They discuss some state-of-the-art techniques non-rigid image registration, iterative image partitioning, egocentric object detection and video shot boundary detection. The technical contributions in the areas of classification and retrieval are categorized as related to high-level operations. They discuss some state-of-the-art approaches extreme learning machines, and target, gesture and action recognition. A non-regularized state preserving extreme learning machine is presented for natural scene classification. An algorithm for human action recognition through dynamic frame warping based on depth cues is given. Target recognition in night vision through convolutional neural network is also presented. Use of convolutional neural network in detecting static hand gesture is also discussed. Finally, the technical contributions in the areas of surveillance, coding and data security, and biometrics and document processing are considered as applications of computervision and imageprocessing. They discuss some contemporary applications. A few of them are a system for tackling blind curves, a quick reaction target acquisition and tracking sys
With the advent of large pre-trained transformer models, fine-tuning these models for various downstream tasks is a critical problem. Paucity of training data, the existence of data silos, and stringent privacy constr...
详细信息
ISBN:
(纸本)9783031776090;9783031776106
With the advent of large pre-trained transformer models, fine-tuning these models for various downstream tasks is a critical problem. Paucity of training data, the existence of data silos, and stringent privacy constraints exacerbate this fine-tuning problem in the medical imaging domain, creating a strong need for algorithms that enable collaborative fine-tuning of pre-trained models. Moreover, the large size of these models necessitates the use of parameter-efficient fine-tuning (PEFT) to reduce the communication burden in federated learning. In this work, we systematically investigate various federated PEFT strategies for adapting a vision Transformer (ViT) model (pre-trained on a large natural image dataset) for medical image classification. Apart from evaluating known PEFT techniques, we introduce new federated variants of PEFT algorithms such as visual prompt tuning (VPT), low-rank decomposition of visual prompts, stochastic block attention fine-tuning, and hybrid PEFT methods like low-rank adaptation (LoRA)+VPT. Moreover, we perform a thorough empirical analysis to identify the optimal PEFT method for the federated setting and understand the impact of data distribution on federated PEFT, especially for out-of-domain (OOD) and non-IID data. The key insight of this study is that while most federated PEFT methods work well for in-domain transfer, there is a substantial accuracy vs. efficiency trade-off when dealing with OOD and non-IID scenarios, which is commonly the case in medical imaging. Specifically, every order of magnitude reduction in fine-tuned/exchanged parameters can lead to a 4% drop in accuracy. Thus, the choice of the initial model is critical for the effectiveness of federated PEFT - rather than starting with general vision models, it is preferable to use medical foundation models (if available) learned using in-domain medical image data. Code: https://***/Naiftt/PEFT.
Multimodal Large Language Models have been showing their powerful ability for solving general vision-language tasks, such as image captioning, vision question answering, which usually on par with or even better than h...
详细信息
computervision is to measure and judge by machine instead of man, and convert the captured target scene into image signal through camera device. Transmitting it to the imageprocessing system and converting it into a...
详细信息
In the field of saliency detection for panoramic images, traditional equirectangular and cube projection methods in panoramic image saliency detection often face issues like distortion and discontinuities, impacting d...
详细信息
Nowadays imageprocessing algorithms are widely used in tracking position and posture estimation of animals;for those who study neurophysiology, the automated analysis of recorded videos allows to associate a specific...
详细信息
ISBN:
(纸本)9783031821226;9783031821233
Nowadays imageprocessing algorithms are widely used in tracking position and posture estimation of animals;for those who study neurophysiology, the automated analysis of recorded videos allows to associate a specific behavior with cerebral activity. The principal purpose is to avoid long tiring observation times and human errors in the validation of the recording. For aquatic animals like crayfish, developing a system that can track position and estimate posture is crucial, as water introduces image distortion. Crayfish is a good model that allows to study behaviors like aggressiveness, sleep and exploration. Developing a computervision system that detects the position in the aquarium and if the crayfish is lying on one side described as the stereotypical sleep position, will help the observer to just analyze specific moments of the recording to take a decision. The results presented in this work show that it's possible to use imageprocessing to determine the position of a crayfish in the aquarium and to establish when the animal is lying on one side, allowing us to plot a graphic that represents the coordinates of position and the "sleep coordinates", those moments when the crayfish was lying on one side, i.e., a whole hypnogram, in a non-supervised way.
This paper presents a novel approach for smoke removal and image restoration using a Multi-Scale Dilated Generative Adversarial Network (MSDGAN). The presence of smoke in images poses significant challenges to both hu...
详细信息
With the development of Artificial Intelligence Generated Content (AIGC), fake image detection has become increasingly challenging. Also leveraging the advanced capabilities of large language models (LLMs) in sequence...
详细信息
ISBN:
(纸本)9789819786848;9789819786855
With the development of Artificial Intelligence Generated Content (AIGC), fake image detection has become increasingly challenging. Also leveraging the advanced capabilities of large language models (LLMs) in sequence prediction, we propose a novel perspective on fake image detection by fine-tuning pure LLMs. We introduce Fake-GPT, a LLM with 7 billion parameters which can differentiate between real and fake images. Unlike conventional imageprocessing models, our approach directly process RGB pixel values without relying on any position embedding and visual-language feature alignment, thereby reducing model complexity and processing steps. Our research demonstrates the effective application of LLMs in detecting fake images, thereby expanding their application in non-textual domains. Extensive experiments conducted on various deepfake datasets show that Fake-GPT achieves competitive results compared with conventional imageprocessing models, underscoring its potential as a new paradigm in the realm of image authentication.
暂无评论