Iris recognition is a widely used biometric technology that has high accuracy and reliability in well-controlled environments. However, the recognition accuracy can significantly degrade in non-ideal scenarios, such a...
详细信息
ISBN:
(数字)9781510661677
ISBN:
(纸本)9781510661660
Iris recognition is a widely used biometric technology that has high accuracy and reliability in well-controlled environments. However, the recognition accuracy can significantly degrade in non-ideal scenarios, such as off-angle iris images. To address these challenges, deep learning frameworks have been proposed to identify subjects through their offangle iris images. Traditional CNN-based iris recognition systems train a single deep network using multiple off-angle iris image of the same subject to extract the gaze invariant features and test incoming off-angle images with this single network to classify it into same subject class. In another approach, multiple shallow networks are trained for each gaze angle that will be the experts for specific gaze angles. When testing an off-angle iris image, we first estimate the gaze angle and feed the probe image to its corresponding network for recognition. In this paper, we present an analysis of the performance of both single and multimodal deep learning frameworks to identify subjects through their off-angle iris images. Specifically, we compare the performance of a single AlexNet with multiple SqueezeNet models. SqueezeNet is a variation of the AlexNet that uses 50x fewer parameters and is optimized for devices with limited computational resources. Multi-model approach using multiple shallow networks, where each network is an expert for a specific gaze angle. Our experiments are conducted on an off-angle iris dataset consisting of 100 subjects captured at 10-degree intervals between -50 to +50 degrees. The results indicate that angles that are more distant from the trained angles have lower model accuracy than the angles that are closer to the trained gaze angle. Our findings suggest that the use of SqueezeNet, which requires fewer parameters than AlexNet, can enable iris recognition on devices with limited computational resources while maintaining accuracy. Overall, the results of this study can contribute to the deve
Given the ever-growing availability of remote sensing data (e.g., Gaofen in China, Sentinel in the EU, and Landsat in the USA), multimodal remote sensing techniques have been garnering increasing attention and have ma...
详细信息
Given the ever-growing availability of remote sensing data (e.g., Gaofen in China, Sentinel in the EU, and Landsat in the USA), multimodal remote sensing techniques have been garnering increasing attention and have made extraordinary progress in various Earth observation (EO)-related tasks. The data acquired by different platforms can provide diverse and complementary information. The joint exploitation of multimodal remote sensing has been proven effective in improving the existing methods of land-use/land-cover segmentation in urban environments. To boost technical breakthroughs and accelerate the development of EO applications across cities and regions, one important task is to build novel cross-city semantic segmentation models based on modern artificial intelligence technologies and emerging multimodal remote sensing data. This leads to the development of better semantic segmentation models with high transferability among different cities and regions. The Cross-City Semantic Segmentation contest is organized in conjunction with the 13th Workshop on Hyperspectral image and Signal Processing: Evolution in Remote Sensing (WHISPERS).
Lately, video-language pre-training and text-video retrieval have attracted significant attention with the explosion of multimedia data on the Internet. However, existing approaches for video-language pre-training typ...
详细信息
Lately, video-language pre-training and text-video retrieval have attracted significant attention with the explosion of multimedia data on the Internet. However, existing approaches for video-language pre-training typically limit the exploitation of the hierarchical semantic information in videos, such as frame semantic information and global video semantic information. In this work, we present an end-to-end pre-training network with Hierarchical Matching and Momentum Contrast named HMMC. The key idea is to explore the hierarchical semantic information in videos via multilevel semantic matching between videos and texts. This design is motivated by the observation that if a video semantically matches a text (can be a title, tag or caption), the frames in this video usually have semantic connections with the text and show higher similarity than frames in other videos. Hierarchical matching is mainly realized by two proxy tasks: Video-Text Matching (VTM) and Frame-Text Matching (FTM). Another proxy task: Frame Adjacency Matching (FAM) is proposed to enhance the single visual modality representations while training from scratch. Furthermore, momentum contrast framework was introduced into HMMC to form a multimodal momentum contrast framework, enabling HMMC to incorporate more negative samples for contrastive learning which contributes to the generalization of representations. We also collected a large-scale Chinese video-language dataset (over 763k unique videos) named CHVTT to explore the multilevel semantic connections between videos and texts. Experimental results on two major Text-video retrieval benchmark datasets demonstrate the advantages of our methods. We release our code at https://***/cheetah003/HMMC.
Deep learning-based image enhancement is challenging in underwater and medical imaging domains, where high-quality training data is often limited. Due to water distortion, loss of color, and contrast, images captured ...
详细信息
Deep learning techniques are commonly utilized to tackle various computer vision problems, including recognition, segmentation, and classification from RGB images. With the availability of a diverse range of sensors, ...
详细信息
The restoration of hyperspectral images (HSIs) is a crucial process that eliminates various types of noise to improve subsequent applications. To effectively utilize the inherent low-rank and spatial smoothness of HSI...
详细信息
The restoration of hyperspectral images (HSIs) is a crucial process that eliminates various types of noise to improve subsequent applications. To effectively utilize the inherent low-rank and spatial smoothness of HSI data, this letter proposes a method that employs multimodal low-rank tensor subspace learning with total variation regularization (MLTSL-TV) model to denoise HSI data based on the observed measurements. The proposed approach utilizes a low-rankness measure of subspace tensors and learnable transform basis to represent a low-rank perspective, which enables adaptive exploitation of potential low-rank structures through multimodal tensor factorization in multiple orientations based on the observed HSI data. More importantly, we put forward a proximal alternating minimization (PAM) algorithm for efficiently solving the proposed model. Experiments were conducted on two simulated and one real HSI dataset, which were compared with representative approaches through both visual and quantitative analysis. The experimental results demonstrate that the proposed MLTSL-TV approach achieves satisfactory performance when compared to the state-of-the-art methods.
Weakly supervised violence detection refers to the technique of training models to identify violent segments in videos using only video-level labels. Among these approaches, multimodal violence detection, which integr...
详细信息
暂无评论