Multi-class cell detection (cancer or non-cancer) from a whole slide image (WSI) is an important task for pathological diagnosis. Cancer and non-cancer cells often have a similar appearance, so it is difficult even fo...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Multi-class cell detection (cancer or non-cancer) from a whole slide image (WSI) is an important task for pathological diagnosis. Cancer and non-cancer cells often have a similar appearance, so it is difficult even for experts to classify a cell from a patch image of individual cells. They usually identify the cell type not only on the basis of the appearance of a single cell but also on the context of the surrounding cells. For using such information, we propose a multi-class cell-detection method that introduces a modified self-attention to aggregate the surrounding image features of both classes. Experimental results demonstrate the effectiveness of the proposed method;our method achieved the best performance compared with a method, which simply uses the standard self-attention method.
Recent approaches to multi-task learning (MTL) have focused on modelling connections between tasks at the decoder level. This leads to a tight coupling between tasks, which need retraining if a new task is inserted or...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Recent approaches to multi-task learning (MTL) have focused on modelling connections between tasks at the decoder level. This leads to a tight coupling between tasks, which need retraining if a new task is inserted or removed. We argue that MTL is a stepping stone towards universal feature learning (UFL), which is the ability to learn generic features that can be applied to new tasks without retraining. We propose Medusa to realize this goal, designing task heads with dual attention mechanisms. The shared feature attention masks relevant backbone features for each task, allowing it to learn a generic representation. Meanwhile, a novel Multi-Scale Attention head allows the network to better combine per-task features from different scales when making the final prediction. We show the effectiveness of Medusa in UFL (+13.18% improvement), while maintaining MTL performance and being 25% more efficient than previous approaches.
Recent progress in 3D display technologies has raised the demand for stylized 3D digital content. Previous approaches either perform style transfer on stereoscopic image pairs or reconstruct 3D environment with multip...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Recent progress in 3D display technologies has raised the demand for stylized 3D digital content. Previous approaches either perform style transfer on stereoscopic image pairs or reconstruct 3D environment with multiple view images. In this paper, we propose a novel view stylization framework that can convert a single 2D image into multiple stylized views. It is a two-stage solution that contains view synthesis and neural style transfer. We estimate dense optical flow between the source and novel views so that the style transfer model can produce consistent results. Experimental results show that our method significantly improves the consistency among views compared to the baseline method.
Action recognition in the dark is gaining more and more attention with the rapid development of intelligent recognition applications in real-world applications, e.g. self-driving at night and night surveillance. Howev...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Action recognition in the dark is gaining more and more attention with the rapid development of intelligent recognition applications in real-world applications, e.g. self-driving at night and night surveillance. However, limited by the expensive labeling cost, it is impractical to produce a large-scale labeled dataset only for dark environments. Therefore, a practical solution adopted is to transfer models trained from clear environments to dark environments through semi-supervised learning. However, prior works rely heavily on additional efforts such as extra annotations, or extra sensors. To this end, we proposed a novel and simple Domain Adaptable Normalization (DANorm) method to align different domains directly, which consists of feature normalization, angle constraint and the Pseudo-Label. Specifically, the proposed DANorm method enables the model automatically learning the associated features between labeled source domain and unlabeled target domain by constraining the feature subspace vectors. Experimental results show that our model achieves superiority performance on Semi-supervised ARID dataset.
Under the new norm of working from home, demand for fitness from home is on the rise. Different exercise forms solve different fitness needs for different people. Yoga gives flexibility and relieves stress. Pilates st...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Under the new norm of working from home, demand for fitness from home is on the rise. Different exercise forms solve different fitness needs for different people. Yoga gives flexibility and relieves stress. Pilates strengthens the muscles. Kung Fu brings balance. It is not feasible for everyone to hire a personal trainer. In this paper, we develop Pose Tutor, an AI-based explainable pose recognition and correction system. Pose Tutor combines vision and pose skeleton models in a novel coarse-to-fine framework to obtain pose class predictions. An angle-likelihood mechanism is used to explain which human joints maximally caused the pose class predictions and also correct any wrongly formed joints. Even without keypoint level training, Pose Tutor shows promising results on Yoga-82, Pilates-32, and Kungfu-7 datasets. Additionally, user studies conducted with multiple domain experts validate the explanations provided by our framework.
In clinical settings, where acquisition conditions and patient populations change over time, continual learning is key for ensuring the safe use of deep neural networks. Yet most existing work focuses on convolutional...
详细信息
ISBN:
(纸本)9781665487399
In clinical settings, where acquisition conditions and patient populations change over time, continual learning is key for ensuring the safe use of deep neural networks. Yet most existing work focuses on convolutional architectures and image classification. Instead, radiologists prefer to work with segmentation models that outline specific regions-of-interest, for which Transformer-based architectures are gaining traction. The self-attention mechanism of Transformers could potentially mitigate catastrophic forgetting, opening the way for more robust medical image segmentation. In this work, we explore how recently-proposed Transformer mechanisms for semantic segmentation behave in sequential learning scenarios, and analyse how best to adapt continual learning strategies for this setting. Our evaluation on hippocampus segmentation shows that Transformer mechanisms mitigate catastrophic forgetting for medical image segmentation compared to purely convolutional architectures, and demonstrates that regularising ViT modules should be done with caution.
Aligning image and text encoders from scratch using contrastive learning requires large amounts of paired image-text data. We alleviate this need by aligning individually pre-trained language and vision representation...
详细信息
We propose a method for Saimaa ringed seal (Pusa hispida saimensis) re-identification. Access to large image volumes through camera trapping and crowdsourcing provides novel possibilities for animal conservation and m...
详细信息
ISBN:
(纸本)9798350370287;9798350370713
We propose a method for Saimaa ringed seal (Pusa hispida saimensis) re-identification. Access to large image volumes through camera trapping and crowdsourcing provides novel possibilities for animal conservation and monitoring and calls for automatic methods for analysis, in particular, when re-identifying individual animals from the images. The proposed method NOvel Ringed seal re-identification by Pelage pattern Aggregation (NORPPA) utilizes the permanent and unique pelage pattern of Saimaa ringed seals and content-based image retrieval techniques. First, the query image is preprocessed, and each seal instance is segmented. Next, the seal's pelage pattern is extracted using a U-net encoder-decoder based method. Then, CNN-based affine invariant features are embedded and aggregated into Fisher Vectors. Finally, the cosine distance between the Fisher Vectors is used to find the best match from a database of known individuals. We perform extensive experiments of various modifications of the method on challenging Saimaa ringed seals re-identification dataset. The proposed method is shown to produce the best re-identification accuracy on our dataset in comparisons with alternative approaches.
This paper reviews the NTIRE 2022 challenge on learning the super-Resolution space. This challenge aims to raise awareness that the super-resolution problem is ill-posed. Since many high-resolution images map to the s...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
This paper reviews the NTIRE 2022 challenge on learning the super-Resolution space. This challenge aims to raise awareness that the super-resolution problem is ill-posed. Since many high-resolution images map to the same low-resolution image, we asked the participants to create methods that sample diverse super-resolution from the space of possible high-resolution images given a low-resolution image. For evaluation, we use the same protocol as introduced in the last year's super-resolution space challenge of NTIRE 2021. We compare the submissions of the participating teams and relate them to the approaches from last year. This challenge contains two tracks: 4x and 8x scale factor. In total, 3 teams competed in the final testing phase.
State-of-the-art object recognition methods do not generalize well to unseen domains. Work in domain generalization has attempted to bridge domains by increasing feature compatibility, but has focused on standard, app...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
State-of-the-art object recognition methods do not generalize well to unseen domains. Work in domain generalization has attempted to bridge domains by increasing feature compatibility, but has focused on standard, appearance-based representations. We show the potential of shape-based representations to increase domain robustness. We compare two types of shape-based representations: one trains a convolutional network over edge features, and another computes a soft, dense medial axis transform. We show the complementary strengths of these representations for different types of domains, and the effect of the amount of texture that is preserved. We show that our shape-based techniques better leverage data augmentations for domain generalization, and are more effective at texture bias mitigation than shape-inducing augmentations. Finally, we show that when the convolutional network in state-of-the-art domain generalization methods is replaced with one that explicitly captures shape, we obtain improved results.
暂无评论