The analysis of the multi-layer structure of wild forests is an important challenge of automated large-scale forestry. While modern aerial LiDARs offer geometric information across all vegetation layers, most datasets...
详细信息
ISBN:
(纸本)9781665487399
The analysis of the multi-layer structure of wild forests is an important challenge of automated large-scale forestry. While modern aerial LiDARs offer geometric information across all vegetation layers, most datasets and methods focus only on the segmentation and reconstruction of the top of canopy. We release WildForest3D, which consists of 29 study plots and over 2000 individual trees across 47 000m(2) with dense 3D annotation, along with occupancy and height maps for 3 vegetation layers: ground vegetation, understory, and overstory. We propose a 3D deep network architecture predicting for the first time both 3D pointwise labels and high-resolution layer occupancy rasters simultaneously. This allows us to produce a precise estimation of the thickness of each vegetation layer as well as the corresponding watertight meshes, therefore meeting most forestry purposes. Both the dataset and the model are released in open access: https://***/ekalinicheva/multi_layer_vegetation.
Skin lesion image datasets gained popularity in recent years with the successes of ISIC datasets and challenges. While the users of these datasets are growing, the Dark Corner Artifact (DCA) phenomenon is under explor...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Skin lesion image datasets gained popularity in recent years with the successes of ISIC datasets and challenges. While the users of these datasets are growing, the Dark Corner Artifact (DCA) phenomenon is under explored. This paper provides a better understanding of how and why DCA occurs, the types of DCAs and investigates the DCA within a curated ISIC image dataset. We introduce new labels of image artifacts on a curated balanced dataset of 9,810 images and identified 2,631 images with different intensities of DCA. Then, we improve the quality of this dataset by introducing automated DCA detection and removal methods. We evaluate the performance of our methods with image quality metrics on an unseen dataset (Dermofit), and achieved better SSIM score in every DCA intensity level. Further, we study the effects of DCA removal on a binary classification task (melanoma vs non-melanoma). Although deep learning performances in this task show marginal differences, we demonstrate that with DCA removal, it can help to shift the network activations to the skin lesions. All the artifact labels and codes are available at: https://***/mmu-dermatologyresearch/dark_corner_artifact_removal.
Recent self-supervised learning methods are able to learn high-quality image representations and are closing the gap with supervised approaches. However, these methods are unable to acquire new knowledge incrementally...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Recent self-supervised learning methods are able to learn high-quality image representations and are closing the gap with supervised approaches. However, these methods are unable to acquire new knowledge incrementally they are, in fact, mostly used only as a pre-training phase over IID data. In this work we investigate self-supervised methods in continual learning regimes without any replay mechanism. We show that naive functional regularization, also known as feature distillation, leads to lower plasticity and limits continual learning performance. Instead, we propose Projected Functional Regularization in which a separate temporal projection network ensures that the newly learned feature space preserves information of the previous one, while at the same time allowing for the learning of new features. This prevents forgetting while maintaining the plasticity of the learner. Comparison with other incremental learning approaches applied to self-supervision demonstrates that our method obtains competitive performance in different scenarios and on multiple datasets.
vision transformers (ViT) have recently attracted considerable attentions, but the huge computational cost remains an issue for practical deployment. Previous ViT pruning methods tend to prune the model along one dime...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
vision transformers (ViT) have recently attracted considerable attentions, but the huge computational cost remains an issue for practical deployment. Previous ViT pruning methods tend to prune the model along one dimension solely, which may suffer from excessive reduction and lead to sub-optimal model quality. In contrast, we advocate a multi-dimensional ViT compression paradigm, and propose to harness the redundancy reduction from attention head, neuron and sequence dimensions jointly. Firstly, we propose a statistical dependence based pruning criterion that is generalizable to different dimensions for identifying the deleterious components. Moreover, we cast the multi-dimensional ViT compression as an optimization problem, objective of which is to learn an optimal pruning policy across the three dimensions while maximizing the compressed model's accuracy under a computational budget. The problem is solved by an adapted Gaussian process search with expected improvement. Experimental results show that our method effectively reduces the computational cost of various ViT models. For example, our method reduces 40% FLOPs without top-1 accuracy loss for DeiT and T2T-ViT models on the ImageNet dataset, outperforming previous state-of-the-art ViT pruning methods.
Semi-supervised learning is a highly researched problem, but existing semi-supervised object detection frameworks are based on RGB images, and existing pre-trained models cannot be used for hyperspectral images. To ov...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Semi-supervised learning is a highly researched problem, but existing semi-supervised object detection frameworks are based on RGB images, and existing pre-trained models cannot be used for hyperspectral images. To overcome these difficulties, this paper first select fewer but suitable data augmentation methods to improve the accuracy of the supervised model based on the labeled training set, which is suitable for the characteristics of hyperspectral images. Next, in order to make full use of the unlabeled training set, we generate pseudo-labels with the model trained in the first stage and mix the obtained pseudo-labels with the labeled training set. Then, a large number of strong data augmentation methods are added to make the final model better. We achieve the SOTA, with an AP of 26.35, on the Semi-Supervised Hyperspectral Object Detection Challenge (SSHODC) in the CVPR 2022 Perception Beyond the Visible Spectrum Workshop, and win the first place in this Challenge.
The goal of meta-learning is to generalize to new tasks and goals as quickly as possible. Ideally, we would like approaches that generalize to new goals and tasks on the first attempt. Requiring a policy to perform on...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
The goal of meta-learning is to generalize to new tasks and goals as quickly as possible. Ideally, we would like approaches that generalize to new goals and tasks on the first attempt. Requiring a policy to perform on a new task on the first attempt without even a single example trajectory is a zero-shot problem formulation. When tasks are identified by goal images, the tasks can be considered visually goal-directed. In this work, we explore the problem of visual goal-directed zero-shot meta-imitation learning. Inspired by several popular approaches to Meta-RL, we composed several core ideas related to task-embedding and planning by gradient descent to attempt to explore this problem. To evaluate these approaches, we adapted the Meta-world benchmark tasks to create 24 distinct visual goal-directed manipulation tasks. We found that 7 out of 24 tasks could be successfully completed on the first attempt by at least one of the approaches we tested. We demonstrated that goal-directed zero-shot approaches can translate to a physical robot with a demonstration based on Jenga block manipulation tasks using a Kinova Jaco robotic arm.
Due to the lack of a large-scale reflection removal dataset with diverse real-world scenes, many existing reflection removal methods are trained on synthetic data plus a small amount of real-world data, which makes it...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Due to the lack of a large-scale reflection removal dataset with diverse real-world scenes, many existing reflection removal methods are trained on synthetic data plus a small amount of real-world data, which makes it difficult to evaluate the strengths or weaknesses of different reflection removal methods thoroughly. Furthermore, existing real-world benchmarks and datasets do not categorize image data based on the types and appearances of reflection (e.g., smoothness, intensity), making it hard to analyze reflection removal methods. Hence, we construct a new reflection removal dataset that is categorized, diverse, and real-world (CDR). A pipeline based on RAW data is used to capture perfectly aligned input images and transmission images. The dataset is constructed using diverse glass types under various environments to ensure diversity. By analyzing several reflection removal methods and conducting extensive experiments on our dataset, we show that state-of-the-art reflection removal methods generally perform well on blurry reflection but fail in obtaining satisfying performance on other types of real-world reflection. We believe our dataset can help develop novel methods to remove real-world reflection better.
The subtask of Human Action recognition (AR) in the dark is gaining a lot of traction nowadays, which takes a significant place in the field of computervision. The implementation of its application includes self-driv...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
The subtask of Human Action recognition (AR) in the dark is gaining a lot of traction nowadays, which takes a significant place in the field of computervision. The implementation of its application includes self-driving at night, human-pose estimation, night surveillance, etc. Currently, solutions such as DLN for AR have emerged. However, due to the poor accuracy even when leveraging on large amounts of datasets and complex architectures, the development of AR in the dark has been slow to progress. In this paper, we propose a novel and straightforward method: Z-Domain Entropy Adaptable Flex. This constructs a neural network architecture R(2+1)D, including (i) a self-attention mechanism, which combines and extracts corresponding and complementary features from the dual pathways;(ii) Zero-DCE low light image enhancement, which improves enhanced quality;and (iii) FlexMatch method, which can generates the pseudo-labels flexibly. With the help of pseudo-labels from FlexMatch, our proposed Z-DEAF method facilitates the process of gaining desired classification boundaries. This works by repeating Expanding Entropy and Shrinking Entropy. It aims to solve the problem of unclear classification boundaries between the categories. Our model obtains superior performance in experiments, and achieves state-of-the-art results on ARID.
Recognizing the types of pollen grains and estimating their proportion in pollen mixture samples collected in a specific geographical area is important for agricultural, medical, and ecosystem research. Our paper adop...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Recognizing the types of pollen grains and estimating their proportion in pollen mixture samples collected in a specific geographical area is important for agricultural, medical, and ecosystem research. Our paper adopts a convolutional neural network for the automatic segmentation of pollen species in microscopy images, and proposes an original strategy to train such network at reasonable manual annotation cost. Our approach is founded on a large dataset composed of pure pollen images. It first (semi-)manually segments foreground, i.e. pollen grains, and background in a fraction of those images, and use the resulting annotated dataset to train a universal pollen segmentation CNN. In the second step, this model is used to automatically segment a large number of additional pure pollen images, so as to supervise the training of a pollen species segmentation model. Despite the fact that it has been trained from pure images only, the model is shown to provide accurate segmentation of species in pollen mixtures. Our experiments also demonstrate that dedicating a model to the segmentation of a subset of the available pure pollen species makes it possible to train a bin pollen class, corresponding to pollen species that are not in the subset of species recognized by the model. This strategy is useful to cope with unexpected species in a mixture.
Recent studies on deep convolutional neural networks present a simple paradigm of architecture design, i.e., models with more MACs typically achieve better accuracies, such as EfficientNet and RegNet. These works try ...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Recent studies on deep convolutional neural networks present a simple paradigm of architecture design, i.e., models with more MACs typically achieve better accuracies, such as EfficientNet and RegNet. These works try to enlarge the network architecture with one unified rule by sampling and statistical methods. However, the rule is not prospective to the design of large networks because it is obtained from the experience of researchers on small network architectures. In this paper, we propose to enlarge the capacity of CNN models by fine-grained MACs allocation for the width, depth and resolution on the stage level. In particular, starting from a base small model, we gradually add extra channels, layers or resolution by using a dynamic programming manner. With step-by-step modifying the computations on different stages, the enlarged network will be equipped with optimal allocation and utilization of MACs. On EfficientNet, our method consistently outperforms the performance of the original scaling method. In particular, the proposed method is used to enlarge models sourced by GhostNet, we achieve state-of-the-art 80.9% and 84.3% ImageNet top-1 accuracies under the setting of 600M and 4.4B MACs, respectively.
暂无评论