检索结果-内蒙古大学图书馆

On Moving Object Segmentation from Monocular Video with Transformers

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Homeyer, Christian Schnörr, Christoph Robert Bosch GmbH Corporate Research Computer Vision Lab Hildesheim Germany Image and Pattern Analysis Group Heidelberg University Germany

Moving object detection and segmentation from a single moving camera is a challenging task, requiring an understanding of recognition, motion and 3D geometry. Combining both recognition and reconstruction boils down to a fusion problem, where appearance and motion features need to be combined for classification and segmentation. In this paper, we present a novel fusion architecture for monocular motion segmentation - M3Former, which leverages the strong performance of transformers for segmentation and multi-modal fusion. As reconstructing motion from monocular video is ill-posed, we systematically analyze different 2D and 3D motion representations for this problem and their importance for segmentation performance. Finally, we analyze the effect of training data and show that diverse datasets are required to achieve SotA performance on Kitti and Davis. Copyright © 2024, The Authors. All rights reserved.

关键词： Object detection

Neuroexplicit Diffusion Models for Inpainting of Optical Flow Fields

学校读者我要写书评

暂无评论

arXiv 2024年

Deep learning has revolutionized the field of computer vision by introducing large scale neural networks with millions of parameters. Training these networks requires massive datasets and leads to in-transparent models that can fail to generalize. At the other extreme, models designed from partial differential equations (PDEs) embed specialized domain knowledge into mathematical equations and usually rely on few manually chosen hyperparameters. This makes them transparent by construction and if designed and calibrated carefully, they can generalize well to unseen scenarios. In this paper, we show how to bring model- and data-driven approaches together by combining the explicit PDE-based approaches with convolutional neural networks to obtain the best of both worlds. We illustrate a joint architecture for the task of inpainting optical flow fields and show that the combination of model- and data-driven modeling leads to an effective architecture. Our model outperforms both fully explicit and fully data-driven baselines in terms of reconstruction quality, robustness and amount of required training data. Averaging the endpoint error across different mask densities, our method outperforms the explicit baselines by 11−27%, the GAN baseline by 47% and the Probabilisitic Diffusion baseline by 42%. With that, our method sets a new state of the art for inpainting of optical flow fields from random masks. © 2024, CC BY.

关键词： Network architecture

Neuroexplicit diffusion models for inpainting of optical flow fields 24

学校读者我要写书评

暂无评论

Neuroexplicit diffusion models for inpainting of optical flo...

Proceedings of the 41st International Conference on Machine Learning

作者： Tom Fischer Pascal Peter Joachim Weickert Eddy Ilg Computer Vision and Machine Perception Lab Saarland University Saarbrücken Germany Mathematical Image Analysis Group Saarland University Saarbrücken Germany

Deep learning has revolutionized the field of computer vision by introducing large scale neural networks with millions of parameters. Training these networks requires massive datasets and leads to intransparent models that can fail to generalize. At the other extreme, models designed from partial differential equations (PDEs) embed specialized domain knowledge into mathematical equations and usually rely on few manually chosen hyper-parameters. This makes them transparent by construction and if designed and calibrated carefully, they can generalize well to unseen scenarios. In this paper, we show how to bring model- and data-driven approaches together by combining the explicit PDE-based approaches with convolutional neural networks to obtain the best of both worlds. We illustrate a joint architecture for the task of inpainting optical flow fields and show that the combination of model- and data-driven modeling leads to an effective architecture. Our model outperforms both fully explicit and fully data-driven baselines in terms of reconstruction quality, robustness and amount of required training data. Averaging the endpoint error across different mask densities, our method outperforms the explicit baselines by 11-27%, the GAN baseline by 47% and the Probabilisitic Diffusion baseline by 42%. With that, our method sets a new state of the art for inpainting of optical flow fields from random masks.

关键词：

On Moving Object Segmentation from Monocular Video with Transformers

学校读者我要写书评

暂无评论

On Moving Object Segmentation from Monocular Video with Tran...

International Conference on computer vision Workshops (ICCV Workshops)

作者： Christian Homeyer Christoph Schnörr Robert Bosch GmbH Corporate Research Computer Vision Lab Hildesheim Germany Image and Pattern Analysis Group Heidelberg University Germany Image and Pattern Analysis Group Heidelberg University Germany

Moving object detection and segmentation from a single moving camera is a challenging task, requiring an understanding of recognition, motion and 3D geometry. Combining both recognition and reconstruction boils down to a fusion problem, where appearance and motion features need to be combined for classification and *** this paper, we present a novel fusion architecture for monocular motion segmentation - M 3 Former, which leverages the strong performance of transformers for segmentation and multi-modal fusion. As reconstructing motion from monocular video is ill-posed, we systematically analyze different 2D and 3D motion representations for this problem and their importance for segmentation performance. Finally, we analyze the effect of training data and show that diverse datasets are required to achieve SotA performance on Kitti and Davis. Code will be released upon publication.

关键词：

Bose Einstein condensate as nonlinear block of a Machine Learning pipeline

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Hans, Maurus Kath, Elinor Sparn, Marius Liebster, Nikolas Strobel, Helmut Oberthaler, Markus K. Draxler, Felix Schnörr, Christoph Kirchhoff-Institut für Physik Universität Heidelberg Im Neuenheimer Feld 227 Germany Computer Vision and Learning Lab Universität Heidelberg Germany Image and Pattern Analysis Group Universität Heidelberg Germany

Physical systems can be used as an information processing substrate and with that extend traditional computing architectures. For such an application the experimental platform must guarantee pristine control of the initial state, the temporal evolution and readout. All these ingredients are provided by modern experimental realizations of atomic Bose Einstein condensates. By embedding the nonlinear evolution of a quantum gas in a Machine Learning pipeline, one can represent nonlinear functions while only linear operations on classical computing of the pipeline are necessary. We demonstrate successful regression and interpolation of a nonlinear function using a quasi one-dimensional cloud of potassium atoms and characterize the performance of our system. © 2023, CC BY.

关键词： Pipelines

A Benchmark for Weakly Semi-Supervised Abnormality Localization in Chest X-Rays

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Ji, Haoqin Liu, Haozhe Li, Yuexiang Xie, Jinheng He, Nanjun Huang, Yawen Wei, Dong Chen, Xinrong Shen, Linlin Zheng, Yefeng Computer Vision Institute College of Computer Science and Software Engineering China AI Research Center for Medical Image Analysis & Diagnosis Shenzhen University China Jarvis Lab Tencent China Academy for Engineering and Technology Fudan University China

Accurate abnormality localization in chest X-rays (CXR) can benefit the clinical diagnosis of various thoracic diseases. However, the lesion-level annotation can only be performed by experienced radiologists, and it is tedious and time-consuming, thus difficult to acquire. Such a situation results in a difficulty to develop a fully-supervised abnormality localization system for CXR. In this regard, we propose to train the CXR abnormality localization framework via a weakly semi-supervised strategy, termed Point Beyond Class (PBC), which utilizes a small number of fully annotated CXRs with lesion-level bounding boxes and extensive weakly annotated samples by points. Such a point annotation setting can provide weakly instance-level information for abnormality localization with a marginal annotation cost. Particularly, the core idea behind our PBC is to learn a robust and accurate mapping from the point annotations to the bounding boxes against the variance of annotated points. To achieve that, a regularization term, namely multi-point consistency, is proposed, which drives the model to generate the consistent bounding box from different point annotations inside the same abnormality. Furthermore, a self-supervision, termed symmetric consistency, is also proposed to deeply exploit the useful information from the weakly annotated data for abnormality localization. Experimental results on RSNA and VinDr-CXR datasets justify the effectiveness of the proposed method. When ≤20% box-level labels are used for training, an improvement of ∼5% in mAP can be achieved by our PBC, compared to the current state-of-the-art method (i.e., Point DETR). Code is available at https://***/HaozheLiu-ST/Point-Beyond-Class. Copyright © 2022, The Authors. All rights reserved.

关键词： Digital storage

Enhancing Medical image Segmentation with TransCeption: A Multi-Scale Feature Fusion Approach

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Azad, Reza Jia, Yiwei Aghdam, Ehsan Khodapanah Cohen-Adad, Julien Merhof, Dorit The Institute of Imaging and Computer Vision RWTH Aachen University Aachen52074 Germany The Department of Electrical Engineering Shahid Beheshti University Tehran*** Iran The Mila Quebec AI Institute Canada The Functional Neuroimaging Unit CRIUGM University of Montreal Canada The NeuroPoly Lab Institute of Biomedical Engineering Polytechnique Montreal MontrealH3T 1J4 Canada The Institute of Image Analysis and Computer Vision Faculty of Informatics and Data Science University of Regensburg Regensburg93053 Germany The Fraunhofer Institute for Digital Medicine MEVIS Bremen28359 Germany

While CNN-based methods have been the cornerstone of medical image segmentation due to their promising performance and robustness, they suffer from limitations in capturing long-range dependencies. Transformer-based approaches are currently prevailing since they enlarge the reception field to model global contextual correlation. To further extract rich representations, some extensions of the U-Net employ multi-scale feature extraction and fusion modules and obtain improved performance. Inspired by this idea, we propose TransCeption for medical image segmentation, a pure transformer-based U-shape network featured by incorporating the inception-like module into the encoder and adopting a contextual bridge for better feature fusion. The design proposed in this work is based on three core principles: (1) The patch merging module in the encoder is redesigned with ResInception Patch Merging (RIPM). Multi-branch transformer (MB transformer) adopts the same number of branches as the outputs of RIPM. Combining the two modules enables the model to capture a multi-scale representation within a single stage. (2) We construct an Intra-stage Feature Fusion (IFF) module following the MB transformer to enhance the aggregation of feature maps from all the branches and particularly focus on the interaction between the different channels of all the scales. (3) In contrast to a bridge that only contains token-wise self-attention, we propose a Dual Transformer Bridge that also includes channel-wise self-attention to exploit correlations between scales at different stages from a dual perspective. Extensive experiments on multi-organ and skin lesion segmentation tasks present the superior performance of TransCeption compared to previous work. The code is publicly available on GitHub. © 2023, CC BY.

关键词： Merging

RecycleNet: Latent Feature Recycling Leads to Iterative Decision Refinement

学校读者我要写书评

暂无评论

RecycleNet: Latent Feature Recycling Leads to Iterative Deci...

IEEE Workshop on Applications of computer vision (WACV)

作者： Gregor Koehler Tassilo Wald Constantin Ulrich David Zimmerer Paul F. Jaeger Jörg K. H. Franke Simon Kohl Fabian Isensee Klaus H. Maier-Hein Division of Medical Image Computing German Cancer Research Center (DKFZ) Heidelberg Germany Helmholtz Information and Data Science School for Health Karlsruhe/Heidelberg Germany Helmholtz Imaging DKFZ National Center for Tumor Diseases (NCT) NCT Heidelberg a Partnership Between DKFZ University Medical Center Heidelberg Interactive Machine Learning Group DKFZ Machine Learning Lab University of Freiburg Freiburg Germany Latent Labs (***) London UK Applied Computer Vision Lab DKFZ Pattern Analysis and Learning Group Heidelberg University Hospital Heidelberg Germany

Despite the remarkable success of deep learning systems over the last decade, a key difference still remains between neural network and human decision-making: As humans, we can not only form a decision on the spot, but also ponder, revisiting an initial guess from different angles, distilling relevant information, arriving at a better decision. Here, we propose RecycleNet, a latent feature recycling method, instilling the pondering capability for neural networks to refine initial decisions over a number of recycling steps, where outputs are fed back into earlier network layers in an iterative fashion. This approach makes minimal assumptions about the neural network architecture and thus can be implemented in a wide variety of contexts. Using medical image segmentation as the evaluation environment, we show that latent feature recycling enables the network to iteratively refine initial predictions even beyond the iterations seen during training, converging towards an improved decision. We evaluate this across a variety of segmentation benchmarks and show consistent improvements even compared with top-performing segmentation methods. This allows trading increased computation time for improved performance, which can be beneficial, especially for safety-critical applications.

关键词：

RecycleNet: Latent Feature Recycling Leads to Iterative Decision Refinement

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Koehler, Gregor Wald, Tassilo Ulrich, Constantin Zimmerer, David Jaeger, Paul F. Franke, Jörg K.H. Kohl, Simon Isensee, Fabian Maier-Hein, Klaus H. Heidelberg Division of Medical Image Computing Germany Helmholtz Information and Data Science School for Health Karlsruhe Heidelberg Germany Helmholtz Imaging DKFZ Germany NCT Heidelberg a partnership between DKFZ University Medical Center Heidelberg Germany Interactive Machine Learning Group DKFZ Applied Computer Vision Lab DKFZ Machine Learning Lab University of Freiburg Freiburg Germany London United Kingdom Pattern Analysis and Learning Group Heidelberg University Hospital Heidelberg Germany

关键词： Network layers