Burst image super-resolution is an ill-posed problem tha' aims to restore a high-resolution (HR) image from a sequence of low-resolution (LR) burst images. To restore a photo-realistic HR image using their abundan...
详细信息
ISBN:
(纸本)9781665448994
Burst image super-resolution is an ill-posed problem tha' aims to restore a high-resolution (HR) image from a sequence of low-resolution (LR) burst images. To restore a photo-realistic HR image using their abundant information, it is essential to align each burst of frames containing random hand-held motion. Some kernel prediction networks (KPNs) that are operated without external motion compensation such as optical flow estimation have been applied to burst image processing as implicit image alignment modules. However, the existing methods do not consider the interdependencies among the kernels of different sizes that have a significant effect on each pixel. In this paper, we propose a novel weighted multi-kernel prediction network (WMKPN) that can learn the discriminative features on each pixel for burst image super-resolution. Our experimental results demonstrate that WMKPN improves the visual quality of super-resolved images. To the best of our knowledge, it outperforms the state-of-the-art within kernel prediction methods and multiple frame super-resolution (MFSR) on both the Zurich RAW to RGB and BurstSR datasets.
Recent advancements in machine learning have spotlighted the potential of hyperbolic spaces as they effectively learn hierarchical feature representations. While there has been progress in leveraging hyperbolic spaces...
详细信息
ISBN:
(数字)9798350353006
ISBN:
(纸本)9798350353013
Recent advancements in machine learning have spotlighted the potential of hyperbolic spaces as they effectively learn hierarchical feature representations. While there has been progress in leveraging hyperbolic spaces in single-modality contexts, its exploration in multimodal settings remains under explored. A recent work has sought to transpose Euclidean multimodal learning techniques to hyperbolic spaces, by adopting a geodesic distance based contrastive loss. However, we show both theoretically and empirically that such spatial proximity based contrastive loss significantly disrupts hierarchies in the latent space. To remedy this, we advocate that the cross-modal representations should accept the inherent modality gap between text and images, and introduce a novel approach to measure cross-modal similarity that does not enforce spatial proximity. Our approach shows remarkable capabilities in preserving unimodal hierarchies while aligning the two modalities. Our experiments on a series of downstream tasks demonstrate that a better latent structure emerges with our objective function while being superior in text-to-image and image-to-text retrieval tasks.
This paper addresses the automatic identification of pelagic species in acoustic backscatter data. Large quantities of data acquired during underwater acoustic surveys for environmental monitoring and resources manage...
详细信息
ISBN:
(纸本)9781665448994
This paper addresses the automatic identification of pelagic species in acoustic backscatter data. Large quantities of data acquired during underwater acoustic surveys for environmental monitoring and resources management, visualized as echograms, are typically analyzed manually or semi-automatically by marine biologists, which is time-consuming and prone to errors and inter-expert disagreements. In this paper, we propose to detect pelagic species (schools of herring and of juvenile salmon) from echograms with a deep learning (DL) framework based on instance segmentation, allowing us to carefully study the acoustic properties of the targets and to address specific challenges such as close proximity between schools and varying size. Experimental results demonstrate our system's ability to correctly detect pelagic species from echograms and to outperform an existing object detection framework designed for schools of herring in terms of detection performance and computational resources utilization. Our pixel-level detection method has the advantage of generating a precise identification of the pixel groups forming each detection, opening up many possibilities for automatic biological analyses.
computervision has been widely used in the field of navigation safety, including ship identification, course prediction, and other applications, as a result of the fast development of patternrecognition and intellig...
详细信息
ISBN:
(数字)9798350393682
ISBN:
(纸本)9798350393699
computervision has been widely used in the field of navigation safety, including ship identification, course prediction, and other applications, as a result of the fast development of patternrecognition and intelligent information processing technology. This has been made possible by the widespread adoption of computervision technology. The sea sky line (SSL) is a linear interval that differentiates the area of the sky from the surface of the water. It is also an important reference in the perception of the marine environment, which enables vessels that are equipped with visual perception equipment to use visual technology perception equipment to obtain information about the environment around them. When contrasted with the environment of land, the environment of the sea surface is characterized by sudden shifts in the weather and the presence of noticeable waves. Consequently, there are still certain challenges involved in the process of detecting the SSL using computervision technology. In this study, a comprehensive analysis and summary of the sea-sky line detection algorithms used by several modal cameras is presented.
This manuscript delineates the outcomes of the fourth Multi-modal Aerial View Image Challenge - Classification (MAVIC-C). The challenge is aimed at advancing the development of recognition models that leverage Synthet...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
This manuscript delineates the outcomes of the fourth Multi-modal Aerial View Image Challenge - Classification (MAVIC-C). The challenge is aimed at advancing the development of recognition models that leverage Synthetic Aperture Radar (SAR) and Electro-Optical (EO) imagery. Encouraging the integration of data from these two distinct modalities, the challenge seeks to foster the creation of multi-modal approaches that complement characteristics of SAR and EO information. Building upon the precedents set in previous years, the 2021 MAVOC challenge validated the potential of integrating SAR and EO modalities. The subsequent 2022 and 2023 challenges further explored the capabilities of multi-modal frameworks. In its latest iteration, the 2024 challenge presents an enhanced UNIfied COincident Optical and Radar for recognition (UNICORN) dataset alongside a revised competition format, focused on the task of SAR classification. The 2024 challenge evaluates model robustness through out-of-distribution measures, alongside traditional accuracy metrics. The core of this paper is devoted to analyzing the methodologies of the top-performing entries and their performance metrics on a blind test set.
Few-shot segmentation performance declines substantially when facing images from a domain different than the training domain, effectively limiting real-world use cases. To alleviate this, recently cross-domain few-sho...
详细信息
ISBN:
(数字)9798350353006
ISBN:
(纸本)9798350353013
Few-shot segmentation performance declines substantially when facing images from a domain different than the training domain, effectively limiting real-world use cases. To alleviate this, recently cross-domain few-shot segmentation (CD-FSS) has emerged. Works that address this task mainly attempted to learn segmentation on a source domain in a manner that generalizes across domains. Surprisingly, we can outperform these approaches while eliminating the training stage and removing their main segmentation network. We show test-time task-adaption is the key for successful CD-FSS instead. Task-adaption is achieved by appending small networks to the feature pyramid of a conventionally classification-pretrained backbone. To avoid overfitting to the few labeled samples in supervised fine-tuning, consistency across augmented views of input images serves as guidance while learning the parameters of the attached layers. Despite our self-restriction not to use any images other than the few labeled samples at test time, we achieve new state-of-the-art performance in CD-FSS, evidencing the need to rethink approaches for the task. Code is available at https://***/vision-Kek/ABCDFSS.
Sketch recognition algorithms are engineered and evaluated using publicly available datasets contributed by the sketch recognition community over the years. While existing datasets contain sketches of a limited set of...
详细信息
ISBN:
(纸本)9781665448994
Sketch recognition algorithms are engineered and evaluated using publicly available datasets contributed by the sketch recognition community over the years. While existing datasets contain sketches of a limited set of generic objects, each new domain inevitably requires collecting new data for training domain specific recognizers. This gives rise to two fundamental concerns: First, will the data collection protocol yield ecologically valid data? Second, will the amount of collected data suffice to train sufficiently accurate classifiers? In this paper, we draw attention to these two concerns. We show that the ecological validity of the data collection protocol and the ability to accommodate small datasets are significant factors impacting recognizer accuracy in realistic scenarios. More specifically, using sketch-based gaming as a use case, we show that deep learning methods, as well as more traditional methods, suffer significantly from dataset shift. Furthermore, we demonstrate that in realistic scenarios where data is scarce and expensive, standard measures taken for adapting deep learners to small datasets fall short of comparing favorably with alternatives. Although transfer learning, and extensive data augmentation help deep learners, they still perform significantly worse compared to standard setups (e.g., SVMs and GBMs with standard feature representations). We pose learning from small datasets as a key problem for the deep sketch recognition field, one which has been ignored in the bulk of the existing literature.
While there has been significant progress in customizing text-to-image generation models, generating images that combine multiple personalized concepts remains challenging. In this work, we introduce Concept Weaver, a...
详细信息
ISBN:
(数字)9798350353006
ISBN:
(纸本)9798350353013
While there has been significant progress in customizing text-to-image generation models, generating images that combine multiple personalized concepts remains challenging. In this work, we introduce Concept Weaver, a method for composing customized text-to-image diffusion models at inference time. Specifically, the method breaks the process into two steps: creating a template image aligned with the semantics of input prompts, and then personalizing the template using a concept fusion strategy. The fusion strategy incorporates the appearance of the target concepts into the template image while retaining its structural details. The results indicate that our method can generate multiple custom concepts with higher identity fidelity compared to alternative approaches. Furthermore, the method is shown to seamlessly handle more than two concepts and closely follow the semantic meaning of the input prompt without blending appearances across different subjects.
Existing score distillation methods are sensitive to classifier-free guidance (CFG) scale, manifested as over-smoothness or instability at small CFG scales, while over-saturation at large ones. To explain and analyze ...
详细信息
ISBN:
(数字)9798350353006
ISBN:
(纸本)9798350353013
Existing score distillation methods are sensitive to classifier-free guidance (CFG) scale, manifested as over-smoothness or instability at small CFG scales, while over-saturation at large ones. To explain and analyze these issues, we revisit the derivation of Score Distillation Sampling (SDS) and decipher existing score distillation with the Wasserstein Generative Adversarial Network (WGAN) paradigm. With the WGAN paradigm, we find that existing score distillation either employs a fixed sub-optimal discriminator or conducts incomplete discriminator optimization, resulting in the scale-sensitive issue. We propose the Adversarial Score Distillation (ASD), which maintains an optimizable discriminator and updates it using the complete optimization objective. Experiments show that the proposed ASD performs favorably in 2D distillation and text-to-3D tasks against existing methods. Furthermore, to explore the generalization ability of our paradigm, we extend ASD to the image editing task, which achieves competitive results. The project page and code are at this link.
Recently, some methods have focused on learning local relation among parts of pedestrian images for person reidentification (Re-ID), as it offers powerful representation capabilities. However, they only provide the in...
详细信息
ISBN:
(纸本)9781665445092
Recently, some methods have focused on learning local relation among parts of pedestrian images for person reidentification (Re-ID), as it offers powerful representation capabilities. However, they only provide the intra-local relation among parts within single pedestrian image and ignore the inter-local relation among parts from different images, which results in incomplete local relation information. In this paper, we propose a novel deep graph model named Heterogeneous Local Graph Attention Networks (HLGAT) to model the inter-local relation and the intra-local relation in the completed local graph, simultaneously. Specifically, we first construct the completed local graph using local features, and we resort to the attention mechanism to aggregate the local features in the learning process of inter-local relation and intra-local relation so as to emphasize the importance of different local features. As for the inter-local relation, we propose the attention regularization loss to constrain the attention weights based on the identities of local features in order to describe the inter-local relation accurately. As for the intra-local relation, we propose to inject the contextual information into the attention weights to consider structure information. Extensive experiments on Market-1501, CUHK03, DukeMTMC-reID and MSMT17 demonstrate that the proposed HLGAT outperforms the state-of-the-art methods.
暂无评论