检索结果-内蒙古大学图书馆

IB-MVS: An Iterative Algorithm for Deep Multi-View Stereo based on Binary Decisions 32

学校读者我要写书评

暂无评论

IB-MVS: An Iterative Algorithm for Deep Multi-View Stereo ba...

32nd British Machine vision Conference, BMVC 2021

作者： Sormann, Christian Rossi, Mattia Kuhn, Andreas Fraundorfer, Friedrich Institute of Computer Graphics and Vision Graz University of Technology Austria Sony Europe B.V. R&D Center Stuttgart Laboratory 1 Germany

We present a novel deep-learning-based method for Multi-View Stereo. Our method estimates high resolution and highly precise depth maps iteratively, by traversing the continuous space of feasible depth values at each pixel in a binary decision fashion. The decision process leverages a deep-network architecture: this computes a pixelwise binary mask that establishes whether each pixel actual depth is in front or behind its current iteration individual depth hypothesis. Moreover, in order to handle occluded regions, at each iteration the results from different source images are fused using pixelwise weights estimated by a second network. Thanks to the adopted binary decision strategy, which permits an efficient exploration of the depth space, our method can handle high resolution images without trading resolution and precision. This sets it apart from most alternative learning-based Multi-View Stereo methods, where the explicit discretization of the depth space requires the processing of large cost volumes. We compare our method with state-of-the-art Multi-View Stereo methods on the DTU, Tanks and Temples and the challenging ETH3D benchmarks and show competitive results. © 2021. The copyright of this document resides with its authors.

关键词： Pixels

MATE: Masked Autoencoders are Online 3D Test-Time Learners

学校读者我要写书评

暂无评论

MATE: Masked Autoencoders are Online 3D Test-Time Learners

International Conference on computer vision (ICCV)

作者： M. Jehanzeb Mirza Inkyu Shin Wei Lin Andreas Schriebl Kunyang Sun Jaesung Choe Mateusz Kozinski Horst Possegger In So Kweon Kuk-Jin Yoon Horst Bischof Institute for Computer Graphics and Vision Graz University of Technology Austria Christian Doppler Laboratory for Embedded Machine Learning Korea Advanced Institute of Science and Technology (KAIST) South Korea Southeast University China

Our MATE is the first Test-Time-Training (TTT) method designed for 3D data, which makes deep networks trained for point cloud classification robust to distribution shifts occurring in test data. Like existing TTT methods from the 2D image domain, MATE also leverages test data for adaptation. Its test-time objective is that of a Masked Autoencoder: a large portion of each test point cloud is removed before it is fed to the network, tasked with reconstructing the full point cloud. Once the network is updated, it is used to classify the point cloud. We test MATE on several 3D object classification datasets and show that it significantly improves robustness of deep networks to several types of corruptions commonly occurring in 3D point clouds. We show that MATE is very efficient in terms of the fraction of points it needs for the adaptation. It can effectively adapt given as few as 5% of tokens of each test sample, making it extremely lightweight. Our experiments show that MATE also achieves competitive performance by adapting sparsely on the test data, which further reduces its computational overhead, making it ideal for real-time applications.

关键词：

CycDA: Unsupervised Cycle Domain Adaptation to Learn from Image to Video

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Lin, Wei Kukleva, Anna Sun, Kunyang Possegger, Horst Kuehne, Hilde Bischof, Horst Institute of Computer Graphics and Vision Graz University of Technology Austria Max-Planck-Institute for Informatics Germany Southeast University China Goethe University Frankfurt Germany Christian Doppler Laboratory for Semantic 3D Computer Vision

Although action recognition has achieved impressive results over recent years, both collection and annotation of video training data are still time-consuming and cost intensive. Therefore, image-to-video adaptation has been proposed to exploit labeling-free web image source for adapting on unlabeled target videos. This poses two major challenges: (1) spatial domain shift between web images and video frames;(2) modality gap between image and video data. To address these challenges, we propose Cycle Domain Adaptation (CycDA), a cycle-based approach for unsupervised image-to-video domain adaptation. We leverage the joint spatial information in images and videos on the one hand and, on the other hand, train an independent spatio-temporal model to bridge the modality gap. We alternate between the spatial and spatio-temporal learning with knowledge transfer between the two in each cycle. We evaluate our approach on benchmark datasets for image-to-video as well as for mixed-source domain adaptation achieving state-of-the-art results and demonstrating the benefits of our cyclic adaptation. Code is available at https://***/wlin-at/CycDA. Copyright © 2022, The Authors. All rights reserved.

关键词： Knowledge management

MATE: Masked Autoencoders are Online 3D Test-Time Learners

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Mirza, M. Jehanzeb Shin, Inkyu Lin, Wei Schriebl, Andreas Sun, Kunyang Choe, Jaesung Possegger, Horst Kozinski, Mateusz Kweon, In So Yoon, Kuk-Jin Bischof, Horst Institute for Computer Graphics and Vision Graz University of Technology Austria Christian Doppler Laboratory for Embedded Machine Learning Korea Republic of Southeast University China

关键词： Classification (of information)

Video Test-Time Adaptation for Action Recognition

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Lin, Wei Mirza, Muhammad Jehanzeb Kozinski, Mateusz Possegger, Horst Kuehne, Hilde Bischof, Horst Institute for Computer Graphics and Vision Graz University of Technology Austria Christian Doppler Laboratory for Semantic 3D Computer Vision Christian Doppler Laboratory for Embedded Machine Learning Goethe University Frankfurt Germany MIT-IBM Watson AI Lab United States

Although action recognition systems can achieve top performance when evaluated on in-distribution test points, they are vulnerable to unanticipated distribution shifts in test data. However, test-time adaptation of video action recognition models against common distribution shifts has so far not been demonstrated. We propose to address this problem with an approach tailored to spatio-temporal models that is capable of adaptation on a single video sample at a step. It consists in a feature distribution alignment technique that aligns online estimates of test set statistics towards the training statistics. We further enforce prediction consistency over temporally augmented views of the same test video sample. Evaluations on three benchmark action recognition datasets show that our proposed technique is architecture-agnostic and able to significantly boost the performance on both, the state of the art convolutional architecture TANet and the Video Swin Transformer. Our proposed method demonstrates a substantial performance gain over existing test-time adaptation approaches in both evaluations of a single distribution shift and the challenging case of random distribution shifts. Code will be available at https://***/wlin-at/ViTTA. Copyright © 2022, The Authors. All rights reserved.

关键词： Benchmarking

CC-DCNet: Dynamic Convolutional Neural Network with Contrastive Constraints for Identifying Lung Cancer Subtypes on Multi-modality Images

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Jin, Yuan Ma, Gege Chen, Geng Lyu, Tianling Egger, Jan Lyu, Junhui Zhang, Shaoting Zhu, Wentao Zhejiang Lab 311121 China Institute of Computer Graphics and Vision Graz University of Technology Graz8010 Austria School of Computer Science and Engineering Northwestern Polytechnical University Shaanxi Xi’an710072 China The Zhejiang University School of Medicine Sir Run Run Shaw Hospital Hangzhou310016 China Shanghai Artificial Intelligence Laboratory Shanghai200120 China

The accurate diagnosis of pathological subtypes of lung cancer is of paramount importance for follow-up treatments and prognosis managements. Assessment methods utilizing deep learning technologies have introduced novel approaches for clinical diagnosis. However, the majority of existing models rely solely on single-modality image input, leading to limited diagnostic accuracy. To this end, we propose a novel deep learning network designed to accurately classify lung cancer subtype with multi-dimensional and multi-modality images, i.e., CT and pathological images. The strength of the proposed model lies in its ability to dynamically process both paired CT-pathological image sets as well as independent CT image sets, and consequently optimize the pathology-related feature extractions from CT images. This adaptive learning approach enhances the flexibility in processing multi-dimensional and multi-modality datasets and results in performance elevating in the model testing phase. We also develop a contrastive constraint module, which quantitatively maps the cross-modality associations through network training, and thereby helps to explore the "gold standard" pathological information from the corresponding CT scans. To evaluate the effectiveness, adaptability, and generalization ability of our model, we conducted extensive experiments on a large-scale multi-center dataset and compared our model with a series of state-of-the-art classification models. The experimental results demonstrated the superiority of our model for lung cancer subtype classification, showcasing significant improvements in accuracy metrics such as ACC, AUC, and F1-score. Copyright © 2024, The Authors. All rights reserved.

关键词： Lung cancer

MD-Net: Multi-Detector for Local Feature Extraction

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Santellani, Emanuele Sormann, Christian Rossi, Mattia Kuhn, Andreas Fraundorfer, Friedrich Institute of Computer Graphics and Vision Graz University of Technology Austria R&D Center Stuttgart Laboratory 1 Sony Europe B.V. Germany

Establishing a sparse set of keypoint correspondences between images is a fundamental task in many computer vision pipelines. Often, this translates into a computationally expensive nearest neighbor search, where every keypoint descriptor at one image must be compared with all the descriptors at the others. In order to lower the computational cost of the matching phase, we propose a deep feature extraction network capable of detecting a predefined number of complementary sets of keypoints at each image. Since only the descriptors within the same set need to be compared across the different images, the matching phase computational complexity decreases with the number of sets. We train our network to predict the keypoints and compute the corresponding descriptors jointly. In particular, in order to learn complementary sets of keypoints, we introduce a novel unsupervised loss which penalizes intersections among the different sets. Additionally, we propose a novel descriptor-based weighting scheme meant to penalize the detection of keypoints with non-discriminative descriptors. With extensive experiments we show that our feature extraction network, trained only on synthetically warped images and in a fully unsupervised manner, achieves competitive results on 3D reconstruction and re-localization tasks at a reduced matching complexity. Copyright © 2022, The Authors. All rights reserved.

关键词： Feature extraction

DELS-MVS: Deep Epipolar Line Search for Multi-View Stereo

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Sormann, Christian Santellani, Emanuele Rossi, Mattia Kuhn, Andreas Fraundorfer, Friedrich Graz University of Technology Institute of Computer Graphics and Vision Austria Sony Europe B.V. R&D Center Stuttgart Laboratory 1 Germany

We propose a novel approach for deep learning-based Multi-View Stereo (MVS). For each pixel in the reference image, our method leverages a deep architecture to search for the corresponding point in the source image directly along the corresponding epipolar line. We denote our method DELS-MVS: Deep Epipolar Line Search Multi-View Stereo. Previous works in deep MVS select a range of interest within the depth space, discretize it, and sample the epipolar line according to the resulting depth values: this can result in an uneven scanning of the epipolar line, hence of the image space. Instead, our method works directly on the epipolar line: this guarantees an even scanning of the image space and avoids both the need to select a depth range of interest, which is often not known a priori and can vary dramatically from scene to scene, and the need for a suitable discretization of the depth space. In fact, our search is iterative, which avoids the building of a cost volume, costly both to store and to process. Finally, our method performs a robust geometry-aware fusion of the estimated depth maps, leveraging a confidence predicted alongside each depth. We test DELS-MVS on the ETH3D, Tanks and Temples and DTU benchmarks and achieve competitive results with respect to state-of-the-art approaches. Copyright © 2022, The Authors. All rights reserved.

关键词： Iterative methods

CLASSIFICATION OF LUNG CANCER SUBTYPES ON CT IMAGES WITH SYNTHETIC PATHOLOGICAL PRIORS

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Zhu, Wentao Jin, Yuan Ma, Gege Chen, Geng Egger, Jan Zhang, Shaoting Metaxas, Dimitris N. Research Center for Healthcare Data Science Zhejiang Lab Hangzhou311121 China School of Computer Science and Engineering Northwestern Polytechnical University Shaanxi Xi’an710072 China Institute of Computer Graphics and Vision Graz University of Technology Graz8010 Austria Shanghai Artificial Intelligence Laboratory Shanghai200120 China Department of Computer Science Rutgers University PiscatawayNJ08854 United States

The accurate diagnosis on pathological subtypes for lung cancer is of significant importance for the follow-up treatments and prognosis managements. In this paper, we propose self-generating hybrid feature network (SGHF-Net) for accurately classifying lung cancer subtypes on computed tomography (CT) images. Inspired by studies stating that cross-scale associations exist in the image patterns between the same case’s CT images and its pathological images, we innovatively developed a pathological feature synthetic module (PFSM), which quantitatively maps cross-modality associations through deep neural networks, to derive the "gold standard" information contained in the corresponding pathological images from CT images. Additionally, we designed a radiological feature extraction module (RFEM) to directly acquire CT image information and integrated it with the pathological priors under an effective feature fusion framework, enabling the entire classification model to generate more indicative and specific pathologically related features and eventually output more accurate predictions. The superiority of the proposed model lies in its ability to self-generate hybrid features that contain multi-modality image information based on a single-modality input. To evaluate the effectiveness, adaptability, and generalization ability of our model, we performed extensive experiments on a large-scale multi-center dataset (i.e., 829 cases from three hospitals) to compare our model and a series of state-of-the-art (SOTA) classification models. The experimental results demonstrated the superiority of our model for lung cancer subtypes classification with significant accuracy improvements in terms of accuracy (ACC), area under the curve (AUC), and F1 score. © 2023, CC0.

关键词： Classification (of information)