The capabilities of foundation models, most recently the Segment Anything Model, have gathered a large degree of attention for providing a versatile framework for tackling a wide array of image segmentation tasks. How...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
The capabilities of foundation models, most recently the Segment Anything Model, have gathered a large degree of attention for providing a versatile framework for tackling a wide array of image segmentation tasks. However, the interplay between human prompting strategies and the segmentation performance of these models remains understudied, as does the role played by the domain knowledge that humans (by previous exposure) and models (by pretraining) bring to the prompting process. To bridge this gap, we present the PointPrompt dataset compiled across multiple image modalities as well as multiple prompting annotators per modality. We collected a total of 16 image datasets from the natural, underwater, medical and seismic domain in order to create a comprehensive resource to facilitate the study of prompting behavior and agreement across modalities. Overall, our prompting dataset contains 158880 inclusion points and 52594 exclusion points over a total of 6000 images. Our analysis highlights the following: (i) viability of prompts across heterogeneous data, (ii) that point prompts are a valuable resource in the effort for enhancing the robustness and generalizability of segmentation models across diverse domains, (iii) prompts facilitate an understanding of the dynamics between annotation strategies and neural network outcomes. Information on downloading the dataset, images, and prompting tool is provided on our project website https://***/pointprompt/.
When stylizing 3D scenes, current methods need to render the full-resolution images from different views and use the style loss, which is proposed for 2D style transfer and needs to be calculated on the whole image, t...
When stylizing 3D scenes, current methods need to render the full-resolution images from different views and use the style loss, which is proposed for 2D style transfer and needs to be calculated on the whole image, to optimize the stylized radiance fields. It is quite inefficient when we need to stylize a large-scale scene. This paper proposes a more efficient method, DeSRF, to stylize the radiance fields, which also transfers style information to the geometry according to the input style. To achieve this goal, on the one hand, we first introduce a deformable module, which can learn the geometric style contained in the input style image and transfer it to radiance fields. On the other hand, although the style loss needs to be calculated for the entire image, actually we do not need to process all the rays when updating the stylized radiance fields. Motivated by this observation, we propose a new training strategy called Dilated Sampling (DS) for efficient stylization propagation. Experimental results show that our method works more efficiently and produces more visually-reasonable stylized 3D scenes with geometry style information compared to other existing approaches.
Diffusion models (DMs) can generate realistic images with text guidance using large-scale datasets. However, they demonstrate limited controllability on the generated images. We introduce iEdit, a novel method for tex...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Diffusion models (DMs) can generate realistic images with text guidance using large-scale datasets. However, they demonstrate limited controllability on the generated images. We introduce iEdit, a novel method for text-guided image editing conditioned on a source image and textual prompt. As a fully-annotated dataset with target images does not exist, previous approaches perform subject-specific fine-tuning at test time or adopt contrastive learning without a target image, leading to issues on preserving source image fidelity. We propose to automatically construct a dataset derived from LAION-5B, containing pseudo-target images and descriptive edit prompts. The dataset allows us to incorporate a weakly-supervised loss function, generating the pseudo-target image from the source image’s latent noise conditioned on the edit prompt. To encourage localised editing we propose a loss function that uses segmentation masks to guide the editing during training and optionally at inference. Trained with limited GPU resources on the constructed dataset, our model outperforms counterparts in image fidelity, CLIP alignment score, and qualitatively for both generated and real images.
Recently, video frame interpolation using a combination of frame- and event-based cameras has surpassed traditional image-based methods both in terms of performance and memory efficiency. However, current methods stil...
详细信息
Exposure correction tasks are dedicated to recovering the brightness and structural information of overexposed or underexposed images. The recovery difficulty of areas with different exposure levels is different, as s...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Exposure correction tasks are dedicated to recovering the brightness and structural information of overexposed or underexposed images. The recovery difficulty of areas with different exposure levels is different, as severely exposed areas are more difficult to recover due to severe structural information loss than commonly exposed areas. However, existing methods focus on the simultaneous recovery of global brightness and structure, ignoring that the recovery difficulty varies between areas. To address this issue, we propose a novel exposure correction strategy named "Inpainting Assisted Exposure Correction"(IAEC), which pre-performs image structure repair on severely exposed areas to guide the exposure correction process. This method is based on the observation that the contextual semantic information contained in the image structure can effectively help the overall image recovery, and the lack of contextual semantic information in severely incorrectly exposed areas is very severe. The pre-performed structural repair by the inpainting model can well supplement the insufficient contextual semantic information caused by severe exposure. Therefore, we use an inpainting model to perform pre-structure repair on severely exposed areas to obtain supplementary contextual semantic information and then align the structure-repaired image with the improperly exposed input at the feature level. Extensive experiments demonstrate that our method gets superior results than the state-of-the-art methods and has the potential to be applied to other tasks with similar context loss problems.
Data augmentations are widely used in training medical image deep learning models to increase the diversity and size of sparse datasets. However, commonly used augmentation techniques can result in loss of clinically ...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Data augmentations are widely used in training medical image deep learning models to increase the diversity and size of sparse datasets. However, commonly used augmentation techniques can result in loss of clinically relevant information from medical images, leading to incorrect predictions at inference time. We propose the Interactive Medical Image Learning (IMIL) framework, a novel approach for improving the training of medical image analysis algorithms that enables clinician-guided intermediate training data augmentations on misprediction outliers, focusing the algorithm on relevant visual information. To prevent the model from using irrelevant features during training, IMIL will ’blackout’ clinician-designated irrelevant regions and replace the original images with the augmented samples. This ensures that for originally mispredicted samples, the algorithm subsequently attends only to relevant regions and correctly correlates them with the respective diagnosis. We validate the efficacy of IMIL using radiology residents and compare its performance to state-of-the-art data augmentations. A 4.2% improvement in accuracy over ResNet-50 was observed when using IMIL on only 4% of the training set. Our study demonstrates the utility of clinician-guided interactive training to achieve meaningful data augmentations for medical image analysis algorithms.
In recent years, we have witnessed the collection of larger and larger multi-modal, image-caption datasets: from hundreds of thousands such pairs to hundreds of millions. Such datasets allow researchers to build power...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
In recent years, we have witnessed the collection of larger and larger multi-modal, image-caption datasets: from hundreds of thousands such pairs to hundreds of millions. Such datasets allow researchers to build powerful deep learning models, at the cost of requiring intensive computational resources. In this work, we ask: can we use such datasets efficiently without sacrificing performance? We tackle this problem by extracting difficulty scores from each image-caption sample, and by using such scores to make training more effective and efficient. We compare two ways to use difficulty scores to influence training: filtering a representative subset of each dataset and ordering samples through curriculum learning. We analyze and compare difficulty scores extracted from a single modality—captions (i.e., caption length and number of object mentions) or images (i.e., region proposals’ size and number)—or based on alignment of image-caption pairs (i.e., CLIP and concreteness). We focus on Weakly-Supervised Object Detection where image-level labels are extracted from captions. We discover that (1) combining filtering and curriculum learning can achieve large gains in performance, but not all methods are stable across experimental settings, (2) singlemodality scores often outperform alignment-based ones, (3) alignment scores show the largest gains when training time is limited.
Building and maintaining High-Definition (HD) maps represents a large barrier to autonomous vehicle deployment. This, along with advances in modern online map detection models, has sparked renewed interest in the onli...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Building and maintaining High-Definition (HD) maps represents a large barrier to autonomous vehicle deployment. This, along with advances in modern online map detection models, has sparked renewed interest in the online mapping problem. However, effectively predicting online maps at a high enough quality to enable safe, driverless deployments remains a significant challenge. Recent work on these models proposes training robust online mapping systems using low quality map priors with synthetic perturbations in an attempt to simulate out-of-date HD map priors. In this paper, we investigate how models trained on these synthetically perturbed map priors generalize to performance on deployment-scale, real world map changes. We present a large-scale experimental study to determine which synthetic perturbations are most useful in generalizing to real world HD map changes, evaluated using multiple years of real-world autonomous driving data. We show there is still a substantial sim2real gap between synthetic prior perturbations and observed real-world changes, which limits the utility of current prior-informed HD map prediction models.
Open-world recognition has recently gained significant attention owing to its ability to bridge the gap between experimental scenarios and real-world applications. Since continual learning can learn from a sequence of...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Open-world recognition has recently gained significant attention owing to its ability to bridge the gap between experimental scenarios and real-world applications. Since continual learning can learn from a sequence of dynamic data streams, it obtains extensive applications in open-world recognition. However, because of the production of data annotation is usually time-consuming and labor-intensive in real-world scenarios, it’s necessary to develop unsupervised continual learning. Recent studies start to investigate unsupervised continual learning (i.e., UCL), but mainly focus on rehearsal and regularization strategies to enhance the anti-forgetting capability of UCL. In practice, rehearsal and regularization are information-dependent, which require information from previous data as supervised signals, e.g., replayed data and previous model. In this paper, we propose an information-free method, Alternate Task Discrimination (ATD), which is a self-supervised pretext task for continuity and improves anti-forgetting capability via encouraging the model to discriminate which data stream current sample is from. The whole process doesn’t rely on any previous information. In order to perform ATD effectively in UCL framework, we design an alternating optimization algorithm where UCL and ATD are optimized respectively. We validate the effectiveness of the proposed method on multiple standard UCL benchmarks, where it obtains considerable improvements compared with baseline methods. In addition, our approach can be used as a plug-in unit, which makes further achievements when collaborated with existing popular UCL methods.
The necessity for a Person ReID system for rapidly evolving urban surveillance applications is severely challenged by domain shifts—variations in data distribution that occur across different environments or times. I...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
The necessity for a Person ReID system for rapidly evolving urban surveillance applications is severely challenged by domain shifts—variations in data distribution that occur across different environments or times. In this paper, we provide the first empirical review of domain shift in person ReID, which includes three settings namely Unsupervised Domain Adaptation ReID, Domain Generalizable ReID, and Lifelong ReID. We observe that existing approaches only tackle domain shifts caused by cross-dataset setting, while ignoring intra-dataset attribute domain shifts caused by changes in clothing, shape, or gait, which is very common in ReID. Thus, we enhance research directions in this field by redefining domain shift in ReID as the combination of attribute domain shift with cross-dataset domain shift. With a focus on Lifelong Re-ID methods, we conduct an extensive comparison on a fair experimental setup and provide an in-depth analysis of these methods under both non-cloth-changing and cloth-changing Re-ID scenarios. Insights into the strengths and limitations of these methods based on their performance are studied. This paper outlines future research directions and paves the way for the development of more adaptive, resilient, and enduring cross-domain ReID systems. Code is available here.
暂无评论