The proceedings contain 2715 papers. The topics discussed include: revisiting adversarial training at scale;SPIDeRS: structured polarization for invisible depth and reflectance sensing;MA-LMM: memory-augmented large m...
ISBN:
(纸本)9798350353006
The proceedings contain 2715 papers. The topics discussed include: revisiting adversarial training at scale;SPIDeRS: structured polarization for invisible depth and reflectance sensing;MA-LMM: memory-augmented large multimodal model for long-term video understanding;geometrically-driven aggregation for zero-shot 3D point cloud understanding;TextCraftor: your text encoder can be image quality controller;ViLa-MIL: dual-scale vision-language multiple instance learning for whole slide image classification;HumanNorm: learning normal diffusion model for high-quality and realistic 3D human generation;AnEmpirical study of scaling law for scene text recognition;improving image restoration through removing degradations in textual representations;and steganographic passport: an owner and user verifiable credential for deep model ip protection without retraining.
The proceedings contain 2356 papers. The topics discussed include: exploring discontinuity for video frame interpolation;two-view geometry scoring without correspondences;language-guided audio-visual source separation...
ISBN:
(纸本)9798350301298
The proceedings contain 2356 papers. The topics discussed include: exploring discontinuity for video frame interpolation;two-view geometry scoring without correspondences;language-guided audio-visual source separation via trimodal consistency;handwritten text generation from visual archetypes;Bayesian posterior approximation with stochastic ensembles;ERM-KTP: knowledge-level machine unlearning via knowledge transfer;PlenVDB: memory efficient VDB-based radiance fields for fast training and rendering;learning and aggregating lane graphs for urban automated driving;teaching matters: investigating the role of supervision in vision transformers;NeuralField-LDM: scene generation with hierarchical latent diffusion models;cut and learn for unsupervised object detection and instance segmentation;probabilistic debiasing of scene graphs;and unifying layout generation with a decoupled diffusion model.
The proceedings contain 16 papers. The special focus in this conference is on Segment Anything in Medical Images on Laptop. The topics include: Filters, Thresholds, and Geodesic Distances for Scribble-Based ...
ISBN:
(纸本)9783031818530
The proceedings contain 16 papers. The special focus in this conference is on Segment Anything in Medical Images on Laptop. The topics include: Filters, Thresholds, and Geodesic Distances for Scribble-Based Interactive Segmentation of Medical Images;Rep-MedSAM: Towards Real-Time and Universal Medical Image Segmentation;Swin-LiteMedSAM: A Lightweight Box-Based Segment Anything Model for Large-Scale Medical Image Datasets;a Light-Weight Universal Medical Segmentation Network for Laptops Based on Knowledge Distillation;taking a Step Back: Revisiting Classical Approaches for Efficient Interactive Segmentation of Medical Images;ExpertsMedSAM: Faster Medical Image Segment Anything with Mixture-of-Experts;efficient Quantization-Aware Training on Segment Anything Model in Medical Images and Its Deployment;Lite Class-Prompt Tiny-VIT for Multi-modality Medical Image Segmentation;Segment Anything in Medical Images with nnUNet;SwiftMedSAM: An Ultra-lightweight Prompt-Based Universal Medical Image Segmentation Model for Highly Constrained Environments;RepViT-MedSAM: Efficient Segment Anything in the Medical Images;U-MedSAM: Uncertainty-Aware MedSAM for Medical Image Segmentation;Modality-Specific Strategies for Medical Image Segmentation Using Lightweight SAM Architectures;gray’s Anatomy for Segment Anything Model: Optimizing Grayscale Medical Images for Fast and Lightweight Segmentation.
The proceedings contain 2072 papers. The topics discussed include: clipped hyperbolic classifiers are super-hyperbolic classifiers;efficient deep embedded subspace clustering;noise is also useful: negative correlation...
ISBN:
(纸本)9781665469463
The proceedings contain 2072 papers. The topics discussed include: clipped hyperbolic classifiers are super-hyperbolic classifiers;efficient deep embedded subspace clustering;noise is also useful: negative correlation-steered latent contrastive learning;active learning for open-set annotation;understanding and increasing efficiency of Frank-Wolfe adversarial training;robust optimization as data augmentation for large-scale graphs;a re-balancing strategy for class-imbalanced classification based on instance difficulty;the devil is in the margin: margin-based label smoothing for network calibration;towards better plasticity-stability trade-off in incremental learning: a simple linear connector;learning Bayesian sparse networks with full experience replay for continual learning;a variational Bayesian method for similarity learning in non-rigid image registration;learning to learn by jointly optimizing neural architecture and weights;learning to prompt for continual learning;multi-frame self-supervised depth with transformers;and rethinking Bayesian deep learning methods for semi-supervised volumetric medical image segmentation.
The proceedings contain 1658 papers. The topics discussed include: single-stage instance shadow detection with bidirectional relation learning;learning Delaunay surface elements for mesh reconstruction;fusing the old ...
ISBN:
(纸本)9781665445092
The proceedings contain 1658 papers. The topics discussed include: single-stage instance shadow detection with bidirectional relation learning;learning Delaunay surface elements for mesh reconstruction;fusing the old with the new: learning relative camera pose with geometry-guided uncertainty;uncertainty guided collaborative training for weakly supervised temporal action detection;privacy-preserving collaborative learning with automatic transformation search;rethinking and improving the robustness of image style transfer;style-aware normalized loss for improving arbitrary style transfer;faster meta update strategy for noise-robust deep learning;a hyperbolic-to-hyperbolic graph convolutional network;training networks in null space of feature covariance for continual learning;and exponential moving average normalization for self-supervised and semi-supervised learning.
The proceedings contain 2 papers. The topics discussed include: attention mechanism exploits temporal contexts: real-time 3D human pose reconstruction;and cascaded deep monocular 3D human pose estimation with evolutio...
ISBN:
(纸本)9781728171685
The proceedings contain 2 papers. The topics discussed include: attention mechanism exploits temporal contexts: real-time 3D human pose reconstruction;and cascaded deep monocular 3D human pose estimation with evolutionary training data.
Sequence-to-sequence vision-language models are showing promise, but their applicability is limited by their inference latency due to their autoregressive way of generating predictions. We propose a parallel decoding ...
详细信息
ISBN:
(纸本)9798350353006
Sequence-to-sequence vision-language models are showing promise, but their applicability is limited by their inference latency due to their autoregressive way of generating predictions. We propose a parallel decoding sequence-to-sequence vision-language model, trained with a Query-CTC loss, that marginalizes over multiple inference paths in the decoder. This allows us to model the joint distribution of tokens, rather than restricting to conditional distribution as in an autoregressive model. The resulting model, NARVL, achieves performance on-par with its state-of-the-art autoregressive counterpart, but is faster at inference time, reducing from the linear complexity associated with the sequential generation of tokens to a paradigm of constant time joint inference.
Despite noise and caption quality having been acknowledged as important factors impacting vision-language contrastive pre-training, in this paper, we show that the full potential of improving the training process by a...
详细信息
ISBN:
(纸本)9798350353006
Despite noise and caption quality having been acknowledged as important factors impacting vision-language contrastive pre-training, in this paper, we show that the full potential of improving the training process by addressing such issues is yet to be realized. Specifically, we firstly study and analyze two issues affecting training: incorrect assignment of negative pairs, and low caption quality and diversity. Then, we devise effective solutions for addressing both problems, which essentially require training with multiple true positive pairs. Finally, we propose training with sigmoid loss to address such a requirement. We show very large gains over the current state-of-the-art for both image recognition (similar to +6% on average over 11 datasets) and image retrieval (similar to +19% on Flickr30k and similar to +15% on MSCOCO).
A recent trend among generalizable novel view synthesis methods is to learn a rendering operator acting over single camera rays. This approach is promising because it removes the need for explicit volumetric rendering...
详细信息
ISBN:
(数字)9798350353006
ISBN:
(纸本)9798350353006
A recent trend among generalizable novel view synthesis methods is to learn a rendering operator acting over single camera rays. This approach is promising because it removes the need for explicit volumetric rendering, but it effectively treats target images as collections of independent pixels. Here, we propose to learn a global rendering operator acting over all camera rays jointly. We show that the right representation to enable such rendering is a 5-dimensional plane sweep volume consisting of the projection of the input images on a set of planes facing the target camera. Based on this understanding, we introduce our Convolutional Global Latent Renderer (ConvGLR), an efficient convolutional architecture that performs the rendering operation globally in a low-resolution latent space. Experiments on various datasets under sparse and generalizable setups show that our approach consistently outperforms existing methods by significant margins.
暂无评论