autism spectrum disorder (ASD) is a neurological condition that disturbs an individual's capability to attach and communicate with others. It instigates in childhood and continues beyond adolescence and adulthood....
详细信息
We tackle the few-shot open-set recognition (FSOSR) problem in the context of remote sensing hyperspectral image (HSI) classification. Prior research on OSR mainly considers an empirical threshold on the class predict...
详细信息
ISBN:
(纸本)9781665409155
We tackle the few-shot open-set recognition (FSOSR) problem in the context of remote sensing hyperspectral image (HSI) classification. Prior research on OSR mainly considers an empirical threshold on the class prediction scores to reject the outlier samples. Further, recent endeavors in few-shot HSI classification fail to recognize outliers due to the `closed-set' nature of the problem and the fact that the entire class distributions are unknown during training. To this end, we propose to optimize a novel outlier calibration network (OCN) together with a feature extraction module during the meta-training phase. The feature extractor is equipped with a novel residual 3D convolutional block attention network (R3CBAM) for enhanced spectral-spatial feature learning from HSI. Our method rejects the outliers based on OCN prediction scores barring the need for manual thresholding. Finally, we propose to augment the query set with synthesized support set features during the similarity learning stage in order to combat the data scarcity issue of few-shot learning. The superiority of the proposed model is showcased on four benchmark HSI datasets.(1)
Eye state detection is an essential task in computervision with diverse applications including emotion recognition, fatigue detection in high-risk areas, and computer interaction. This paper introduces a classificati...
详细信息
We present an unsupervised learning approach for optical flow estimation by improving the upsampling and learning of pyramid network. We design a self-guided upsample module to tackle the interpolation blur problem ca...
详细信息
ISBN:
(纸本)9781665445092
We present an unsupervised learning approach for optical flow estimation by improving the upsampling and learning of pyramid network. We design a self-guided upsample module to tackle the interpolation blur problem caused by bilinear upsampling between pyramid levels. Moreover, we propose a pyramid distillation loss to add supervision for intermediate levels via distilling the finest flow as pseudo labels. By integrating these two components together, our method achieves the best performance for unsupervised optical flow learning on multiple leading benchmarks, including MPI-SIntel, KITTI 2012 and KITTI 2015. In particular, we achieve EPE=1.4 on KITTI 2012 and F1=9.38% on KITTI 2015, which outperform the previous state-of-the-art methods by 22.2% and 15.7%, respectively.
The innovative generation of vector graphics with fine-grained images using Artificial Intelligence has become an important task in edge extraction. In this paper, we take Qiang embroidery image as an example due to i...
详细信息
Weight sharing has become a de facto standard in neural architecture search because it enables the search to be done on commodity hardware. However, recent works have empirically shown a ranking disorder between the p...
详细信息
ISBN:
(纸本)9781665445092
Weight sharing has become a de facto standard in neural architecture search because it enables the search to be done on commodity hardware. However, recent works have empirically shown a ranking disorder between the performance of stand-alone architectures and that of the corresponding shared-weight networks. This violates the main assumption of weight-sharing NAS algorithms, thus limiting their effectiveness. We tackle this issue by proposing a regularization term that aims to maximize the correlation between the performance rankings of the shared-weight network and that of the standalone architectures using a small set of landmark architectures. We incorporate our regularization term into three different NAS algorithms and show that it consistently improves performance across algorithms, search-spaces, and tasks.
In the world of action recognition research, one primary focus has been on how to construct and train networks to model the spatial-temporal volume of an input video. These methods typically uniformly sample a segment...
详细信息
ISBN:
(纸本)9781665409155
In the world of action recognition research, one primary focus has been on how to construct and train networks to model the spatial-temporal volume of an input video. These methods typically uniformly sample a segment of an input clip (along the temporal dimension). However, not all parts of a video are equally important to determine the action in the clip. In this work, we focus instead on learning where to extract features, so as to focus on the most informative parts of the video. We propose a method called the non-uniform temporal aggregation (NUTA), which aggregates features only from informative temporal segments. We also introduce a synchronization method that allows our NUTA features to be temporally aligned with traditional uniformly sampled video features, so that both local and clip-level features can be combined. Our model has achieved state-of-the-art performance on four widely used large-scale action-recognition datasets (Kinetics400, Kinetics700, Something-something V2 and Charades). In addition, we have created a visualization to illustrate how the proposed NUTA method selects only the most relevant parts of a video clip.
We present a plug-in replacement for batch normalization (BN) called exponential moving average normalization (EMAN), which improves the performance of existing student-teacher based self- and semi-supervised learning...
详细信息
ISBN:
(纸本)9781665445092
We present a plug-in replacement for batch normalization (BN) called exponential moving average normalization (EMAN), which improves the performance of existing student-teacher based self- and semi-supervised learning techniques. Unlike the standard BN, where the statistics are computed within each batch, EMAN, used in the teacher, updates its statistics by exponential moving average from the BN statistics of the student. This design reduces the intrinsic cross-sample dependency of BN and enhances the generalization of the teacher. EMAN improves strong baselines for self-supervised learning by 4-6/1-2 points and semi-supervised learning by about 7/2 points, when 1%/10% supervised labels are available on ImageNet. These improvements are consistent across methods, network architectures, training duration, and datasets, demonstrating the general effectiveness of this technique. The code will be made available online.
Semantic segmentation of Remote Sensing Images (RSIs) is an essential application for precision agriculture, environmental protection, and economic assessment. While UNet-based networks have made significant progress,...
详细信息
ISBN:
(纸本)9789819984619;9789819984626
Semantic segmentation of Remote Sensing Images (RSIs) is an essential application for precision agriculture, environmental protection, and economic assessment. While UNet-based networks have made significant progress, they still face challenges in capturing long-range dependencies and preserving fine-grained details. To address these limitations and improve segmentation accuracy, we propose an effective method, namely UAM-Net (UNet with Attention-based Multi-level feature fusion), to enhance global contextual understanding and maintain fine-grained information. To be specific, UAM-Net incorporates three key modules. Firstly, the Global Context Guidance Module (GCGM) integrates semantic information from the Pyramid Pooling Module (PPM) into each decoder stage. Secondly, the Triple Attention Module (TAM) effectively addresses feature discrepancies between the encoder and decoder. Finally, the computation-effective Linear Attention Module (LAM) seamlessly fuses coarse-level feature maps with multiple decoder stages. With the corporations of these modules, UAM-Net significantly outperforms the most state-of-the-art methods on two popular benchmarks.
The paper discusses the importance of health and wellness in today's tech-dependent society and introduces the model named as "Food Diet Recaller App - FDRA". This app employs AI and computervision to h...
详细信息
暂无评论