This paper outlines our submission for the 4th COV19D competition as part of the ‘Domain adaptation, Explainability, Fairness in AI for Medical Image Analysis’ (DEFAI-MIA) workshop at the computervision and pattern...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
This paper outlines our submission for the 4th COV19D competition as part of the ‘Domain adaptation, Explainability, Fairness in AI for Medical Image Analysis’ (DEFAI-MIA) workshop at the computervision and patternrecognitionconference (CVPR). The competition consists of two challenges. The first is to train a classifier to detect the presence of COVID-19 from over one thousand CT scans from the COV19-CT-DB database. The second challenge is to perform domain adaptation by taking the dataset from Challenge 1 and adding a small number of scans (some annotated and other not) for a different distribution. We preprocessed the CT scans to segment the lungs, and output volumes with the lungs individually and together. We then trained 3D ResNet and Swin Transformer models on these inputs. We annotated the unlabeled CT scans using an ensemble of these models and chose the high-confidence predictions as pseudo-labels for fine-tuning. This achieved the winning macro F1 score of 94.89% for Challenge 1 of the competition. It also achieved a second-best macro F1 score of 77.21% for Challenge 2.
The implementation of multi-target multi-camera tracking systems in indoor environments, including shops and warehouses, facilitates strategic product positioning and the improvement of operational workflows. This pap...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
The implementation of multi-target multi-camera tracking systems in indoor environments, including shops and warehouses, facilitates strategic product positioning and the improvement of operational workflows. This paper presents the online multi-target multi-camera tracking framework OCMCTrack, which tracks the 3D positions of people in the world. The proposed framework introduces a novel matching cascade to re-evaluate track assignments dynamically, thus minimizing false positive associations often made by online trackers. Additionally, this work presents three effective methods to enhance the transformation of a person’s position in the image to world coordinates, thereby addressing common inaccuracies in positional reference points. The proposed methodology is able to achieve competitive performance in Track 1 of the 2024 AI City Challenge, demonstrating the effectiveness of the framework.
This demonstration shows live operation of of PDAVIS polarization event camera reconstruction by the E2P DNN reported in the main CVPR conference paper Deep Polarization Reconstruction with PDAVIS Events (paper 9149 [...
This demonstration shows live operation of of PDAVIS polarization event camera reconstruction by the E2P DNN reported in the main CVPR conference paper Deep Polarization Reconstruction with PDAVIS Events (paper 9149 [7]). Demo code: ***/SensorsINI/e2p
Face recognition systems are widely used in real-world scenarios but are susceptible to physical and digital attacks. Effective methods for unified detection of both physical face attacks and digital face attacks are ...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Face recognition systems are widely used in real-world scenarios but are susceptible to physical and digital attacks. Effective methods for unified detection of both physical face attacks and digital face attacks are essential to ensure the reliability of face recognition systems. However, how to obtain a unified face attack detection model that has adequate ability of fine-grained perception and cross-domain generalization ability remains an open challenge. To address this issue, we first propose a two-stage training strategy, which utilizes unlabeled face images with masked image modeling and unleashes the potential of vision transformers. Furthermore, we propose a novel method termed as Micro Disturbance, which successfully enriches the representation distribution of forged faces and increases the diversity of the training data, thereby addressing the issue of cross-domain generalization. Attribute to the effectiveness of our proposed methods, our model finally wins the third place in the 5th Face Anti-Spoofing Challenge@CVPR2024, with an impressive ACER score of 5.511.
Generating consistent multiple views for 3D reconstruction tasks is still a challenge to existing image-to-3D diffusion models. Generally, incorporating 3D representations into diffusion model decrease the model’s sp...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Generating consistent multiple views for 3D reconstruction tasks is still a challenge to existing image-to-3D diffusion models. Generally, incorporating 3D representations into diffusion model decrease the model’s speed as well as generalizability and quality. This paper proposes a general framework to generate consistent multi-view images from single image or leveraging scene representation transformer and view-conditioned diffusion model. In the model, we introduce epipolar geometry constraints and multi-view attention to enforce 3D consistency. From as few as one image input, our model is able to generate 3D meshes surpassing baselines methods in evaluation metrics, including PSNR, SSIM and LPIPS.
Microscopy images often feature regions of low signal-to-noise ratio (SNR) which leads to a considerable amount of ambiguity in the correct corresponding segmentation. This ambiguity can introduce inconsistencies in t...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Microscopy images often feature regions of low signal-to-noise ratio (SNR) which leads to a considerable amount of ambiguity in the correct corresponding segmentation. This ambiguity can introduce inconsistencies in the segmentation mask which violate known biological constraints. In this work, we present a methodology which identifies areas of low SNR and refines the segmentation masks such that they are consistent with biological structures. Low SNR regions with uncertain segmentation are detected using model ensembling and selectively restored by a masked autoencoder (MAE) which leverages information about well-imaged surrounding areas. The prior knowledge of biologically consistent segmentation masks is directly learned from the *** validate our approach in the context of analysing intracellular structures, specifically by refining segmentation masks of mitochondria in expansion microscopy images with a global staining.
In the class incremental learning, the number of classes to be handled dynamically raises with the number of considered tasks. The main challenge of this learning schema is catastrophic forgetting, that is the perform...
详细信息
ISBN:
(纸本)9781665448994
In the class incremental learning, the number of classes to be handled dynamically raises with the number of considered tasks. The main challenge of this learning schema is catastrophic forgetting, that is the performance degradation on old tasks after learning new tasks. Existing incremental learning algorithms generally choose to train a multi-class classifier (e.g. softmax classifier), which learns a decision boundary to divide the feature space into several parts. Therefore, when new data arrive, the learned boundary will be updated and thus may cause forgetting. Compared with multi-class classifiers, a one-class classifier focuses on characterizing the distribution of a single class. As a result, the decision boundary learned for each category is tighter and does not change during learning new tasks. Inspired by this characteristic of one-class classifier, we propose a novel Incremental Learning framework based on Contrastive One-class Classifiers (ILCOC) to avoid catastrophic forgetting. Specifically, we train a specific one-class classifier for each category and parallelly use them to achieve incremental multi-class recognition. Besides, we design a scale-boundary loss, a classifier-contrastive loss and a negative-suppression loss to strengthen the comparability of classifiers outputs and the discrimination ability of each one-class classifier. We evaluate ILCOC on MNIST, CIFAR-10 and Tiny-ImageNet datasets, and the experimental results show that ILCOC achieves state-of-the-art performance.(1)
Despite significant recent developments, visual assistance systems are still severely constrained by sensor capabilities, form factor, battery power consumption, computational resources and the use of traditional comp...
详细信息
ISBN:
(纸本)9781665448994
Despite significant recent developments, visual assistance systems are still severely constrained by sensor capabilities, form factor, battery power consumption, computational resources and the use of traditional computervision algorithms. Current visual assistance systems cannot adequately perform complex computervision tasks that entail deep learning. We present the design and implementation of a novel visual assistance system that employs deep learning and point cloud processing to perform advanced perception tasks on a cost-effective, low-power mobile computing platform. The proposed system design circumvents the need for expensive, power-intensive Graphical Processing Unit (GPU)-based hardware required by most deep learning algorithms for real-time inference by employing instead edge Artificial Intelligence (AI) accelerators such as the Neural Compute Stick-2 (NCS2), model optimization techniques such as OpenVINO, and TensorFlow Lite, and smart depth sensors such as OpenCV AI Kit-Depth (OAK-D). Critical system design challenges such as training data collection, real-time capability, computational efficiency, power consumption, portability and reliability are addressed. The proposed system includes more advanced functionality than existing systems such as assessment of traffic conditions and detection and localization of hanging obstacles, crosswalks, moving obstacles and sudden elevation changes. The proposed system design incorporates an AI-based voice interface that allows for user-friendly interaction and control and is shown to realize a simple, cost-effective, power-efficient, portable and unobtrusive visual assistance device.
Conditional graphic layout generation, which generates realistic layouts according to user constraints, is a challenging task that has not been well-studied yet. First, there is limited discussion about how to handle ...
详细信息
Relighting is an interesting yet challenging low-level vision problem, which aims to re-render the scene with new light sources. In this paper, we introduce LTNet, a novel framework for image relighting. Unlike previo...
详细信息
ISBN:
(纸本)9781665448994
Relighting is an interesting yet challenging low-level vision problem, which aims to re-render the scene with new light sources. In this paper, we introduce LTNet, a novel framework for image relighting. Unlike previous methods, we propose to solve this challenging problem by decoupling the enhancement process. Specifically, we propose to train a network that focuses on learning light variations. Our key insight is that light variations are the critical information to be learned because the scene stays unchanged during the light transfer process. To this end, we employ a global residual connection and corresponding residual loss for capturing light variations. Experimental results show that the proposed method achieves better visual quality on the VIDIT dataset in the NTIRE2021 relighting challenge.
暂无评论