检索结果-内蒙古大学图书馆

multimodal image exploitation and learning 2023

作者： Chavarro, David Karakaya, Mahmut Dept of Computer Science Kennesaw State University MariettaGA30060 United States

ISBN: (数字)9781510661677

ISBN: (纸本)9781510661660

Iris recognition is a widely used biometric technology that has high accuracy and reliability in well-controlled environments. However, the recognition accuracy can significantly degrade in non-ideal scenarios, such as off-angle iris images. To address these challenges, deep learning frameworks have been proposed to identify subjects through their offangle iris images. Traditional CNN-based iris recognition systems train a single deep network using multiple off-angle iris image of the same subject to extract the gaze invariant features and test incoming off-angle images with this single network to classify it into same subject class. In another approach, multiple shallow networks are trained for each gaze angle that will be the experts for specific gaze angles. When testing an off-angle iris image, we first estimate the gaze angle and feed the probe image to its corresponding network for recognition. In this paper, we present an analysis of the performance of both single and multimodal deep learning frameworks to identify subjects through their off-angle iris images. Specifically, we compare the performance of a single AlexNet with multiple SqueezeNet models. SqueezeNet is a variation of the AlexNet that uses 50x fewer parameters and is optimized for devices with limited computational resources. Multi-model approach using multiple shallow networks, where each network is an expert for a specific gaze angle. Our experiments are conducted on an off-angle iris dataset consisting of 100 subjects captured at 10-degree intervals between -50 to +50 degrees. The results indicate that angles that are more distant from the trained angles have lower model accuracy than the angles that are closer to the trained gaze angle. Our findings suggest that the use of SqueezeNet, which requires fewer parameters than AlexNet, can enable iris recognition on devices with limited computational resources while maintaining accuracy. Overall, the results of this study can contribute to the deve

关键词： Biometrics

来源：评论

学校读者我要写书评

暂无评论

Cross-City Semantic Segmentation (C2Seg) in multimodal Remote Sensing: Outcome of the 2023 IEEE WHISPERS C2Seg Challenge

引用

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 2024年 17卷 8851-8862页

作者： Liu, Yuheng Wang, Ye Zhang, Yifan Mei, Shaohui Zou, Jiaqi Li, Zhuohong Lu, Fangxiao He, Wei Zhang, Hongyan Zhao, Huilin Chen, Chuan Xia, Cong Li, Hao Vivone, Gemine Haensch, Ronny Taskin, Gulsen Yao, Jing Qin, A. K. Zhang, Bing Chanussot, Jocelyn Hong, Danfeng Northwestern Polytech Univ Sch Elect & Informat Xian 710129 Peoples R China Wuhan Univ State Key Lab Informat Engn Surveying Mapping & Re Wuhan 430079 Peoples R China China Univ Geosci Sch Comp Sci Wuhan 430078 Peoples R China Hong Kong Polytech Univ Dept Land Surveying & Geoinformat Hong Kong Peoples R China Tech Univ Munich Chair Cartog & Visual Analyt TUM Sch Engn & Design D-80333 Munich Germany Sch Resource & Environm Engn Wuhan 430070 Peoples R China Tech Univ Munich Dept Aerosp & Geodesy Professorship Big Geospatial Data Management D-80333 Munich Germany Natl Res Council IMAA Inst Methodol Environm Anal I-85050 Tito Italy German Aerosp Ctr Microwaves & Radar Inst Dept SAR Technol D-82234 Wessling Germany Istanbul Tech Univ Inst Disaster Management TR-34469 Istanbul Turkiye Chinese Acad Sci Aerosp Informat Res Inst Beijing 100094 Peoples R China Swinburne Univ Technol Dept Comp Technol Hawthorn VIC 3122 Australia Univ Chinese Acad Sci Coll Resources & Environm Beijing 100049 Peoples R China Univ Grenoble Alpes Inria CNRS Grenoble INPLJK F-38000 Grenoble France Univ Chinese Acad Sci Sch Elect Elect & Commun Engn Beijing 100049 Peoples R China

Given the ever-growing availability of remote sensing data (e.g., Gaofen in China, Sentinel in the EU, and Landsat in the USA), multimodal remote sensing techniques have been garnering increasing attention and have made extraordinary progress in various Earth observation (EO)-related tasks. The data acquired by different platforms can provide diverse and complementary information. The joint exploitation of multimodal remote sensing has been proven effective in improving the existing methods of land-use/land-cover segmentation in urban environments. To boost technical breakthroughs and accelerate the development of EO applications across cities and regions, one important task is to build novel cross-city semantic segmentation models based on modern artificial intelligence technologies and emerging multimodal remote sensing data. This leads to the development of better semantic segmentation models with high transferability among different cities and regions. The Cross-City Semantic Segmentation contest is organized in conjunction with the 13th Workshop on Hyperspectral image and Signal Processing: Evolution in Remote Sensing (WHISPERS).

关键词： Artificial intelligence (AI) cross-city deep learning hyperspectral land cover multimodal benchmark datasets remote sensing semantic segmentation

来源：评论

学校读者我要写书评

暂无评论

End-to-End Pre-Training With Hierarchical Matching and Momentum Contrast for Text-Video Retrieval

引用

IEEE TRANSACTIONS ON image PROCESSING 2023年 32卷 5017-5030页

作者： Shen, Wenxue Song, Jingkuan Zhu, Xiaosu Li, Gongfu Shen, Heng Tao Univ Elect Sci & Technol China Ctr Future Media Sch Comp Sci & Engn Chengdu 611731 Peoples R China Univ Elect Sci & Technol China Shenzhen Inst Adv Study Shenzhen 518110 Peoples R China Peng Cheng Lab Shenzhen 518066 Peoples R China Tencent Inc Corp Dev Grp Shenzhen 518057 Peoples R China

Lately, video-language pre-training and text-video retrieval have attracted significant attention with the explosion of multimedia data on the Internet. However, existing approaches for video-language pre-training typically limit the exploitation of the hierarchical semantic information in videos, such as frame semantic information and global video semantic information. In this work, we present an end-to-end pre-training network with Hierarchical Matching and Momentum Contrast named HMMC. The key idea is to explore the hierarchical semantic information in videos via multilevel semantic matching between videos and texts. This design is motivated by the observation that if a video semantically matches a text (can be a title, tag or caption), the frames in this video usually have semantic connections with the text and show higher similarity than frames in other videos. Hierarchical matching is mainly realized by two proxy tasks: Video-Text Matching (VTM) and Frame-Text Matching (FTM). Another proxy task: Frame Adjacency Matching (FAM) is proposed to enhance the single visual modality representations while training from scratch. Furthermore, momentum contrast framework was introduced into HMMC to form a multimodal momentum contrast framework, enabling HMMC to incorporate more negative samples for contrastive learning which contributes to the generalization of representations. We also collected a large-scale Chinese video-language dataset (over 763k unique videos) named CHVTT to explore the multilevel semantic connections between videos and texts. Experimental results on two major Text-video retrieval benchmark datasets demonstrate the advantages of our methods. We release our code at https://***/cheetah003/HMMC.

关键词： Videos Hidden Markov models Semantics Task analysis Transformers Training Feature extraction multimodal pre-training video retrieval contrastive learning

来源：评论

学校读者我要写书评

暂无评论

New Method of Color image Histogram Enhancement with Gray Replacement for Underwater and Medical images

New Method of Color Image Histogram Enhancement with Gray Re...

引用

multimodal image exploitation and learning 2023

作者： Grigoryan, Artyom M. Salazar, Irene Agaian, Sos S. University of Texas San Antonio United States City University of New York CSI United States

ISBN: (纸本)9781510661660

Deep learning-based image enhancement is challenging in underwater and medical imaging domains, where high-quality training data is often limited. Due to water distortion, loss of color, and contrast, images captured in these settings could be of better quality, making it easier to train deep learning models effectively on large and diverse datasets. This limitation can negatively impact the performance of these models. This paper proposes an alternative approach to supervised color image enhancement to address this challenge. Specifically, the authors propose to enhance images in both the spatial and frequency domains using their 2×2 model quaternion image structure, which was previously proposed. The color image components plus gray or brightness are map into the grayscale image of twice size and then HE of new grays is calculated. The new colors and gray of the image are reconstructed from the new image in 2×2 model. The approach is tested extensively through computer simulations, demonstrating that the proposed framework achieves competitive performance in quantitative and qualitative metrics compared to state-of-the-art approaches. © 2023 SPIE.

关键词： Color

来源：评论

学校读者我要写书评

暂无评论

Multi-sensor Data Fusion Using Deep learning for Bulky Waste image Classification 5

Multi-sensor Data Fusion Using Deep Learning for Bulky Waste...

引用

Automated Visual Inspection and Machine Vision V 2023

作者： Bihler, Manuel Roming, Lukas Jiang, Yifan Afifi, Ahmed J. Aderhold, Jochen Čibiraitė-Lukenskienė, Dovilė Lorenz, Sandra Gloaguen, Richard Gruna, Robin Heizmann, Michael Karlsruhe Germany Fraunhofer Institute of Optronics System Technology and Image Exploitation Karlsruhe Germany Freiberg Germany Braunschweig Germany Kaiserslautern Germany

ISBN: (数字)9781510664562

ISBN: (纸本)9781510664555

Deep learning techniques are commonly utilized to tackle various computer vision problems, including recognition, segmentation, and classification from RGB images. With the availability of a diverse range of sensors, industry-specific datasets are acquired to address specific challenges. These collected datasets have varied modalities, indicating that the images possess distinct channel numbers and pixel values that have different interpretations. Implementing deep learning methods to attain optimal outcomes on such multimodal data is a complicated procedure. To enhance the performance of classification tasks in this scenario, one feasible approach is to employ a data fusion technique. Data fusion aims to use all the available information from all sensors and integrate them to obtain an optimal outcome. This paper investigates early fusion, intermediate fusion, and late fusion in deep learning models for bulky waste image classification. For training and evaluation of the models, a multimodal dataset is used. The dataset consists of RGB, hyperspectral near-infrared (NIR), Thermography, and Terahertz images of bulky waste. The results of this work show that multimodal sensor fusion can enhance classification accuracy compared to a single-sensor approach for the used dataset. Hereby, late fusion performed the best with an accuracy of 0.921 compared to intermediate and early fusion, on our test data. © 2023 SPIE.

关键词： image classification

来源：评论

学校读者我要写书评

暂无评论

multimodal Low-Rank Tensor Subspace learning for Hyperspectral image Restoration

引用

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS 2023年 20卷 1-1页

作者： Lv, Junrui Luo, Xuegang Wang, Juan Panzhihua Univ Sch Math & Comp Sci Panzhihua 617000 Peoples R China China West Normal Univ Coll Comp Sci Nanchong 637000 Peoples R China

The restoration of hyperspectral images (HSIs) is a crucial process that eliminates various types of noise to improve subsequent applications. To effectively utilize the inherent low-rank and spatial smoothness of HSI data, this letter proposes a method that employs multimodal low-rank tensor subspace learning with total variation regularization (MLTSL-TV) model to denoise HSI data based on the observed measurements. The proposed approach utilizes a low-rankness measure of subspace tensors and learnable transform basis to represent a low-rank perspective, which enables adaptive exploitation of potential low-rank structures through multimodal tensor factorization in multiple orientations based on the observed HSI data. More importantly, we put forward a proximal alternating minimization (PAM) algorithm for efficiently solving the proposed model. Experiments were conducted on two simulated and one real HSI dataset, which were compared with representative approaches through both visual and quantitative analysis. The experimental results demonstrate that the proposed MLTSL-TV approach achieves satisfactory performance when compared to the state-of-the-art methods.

关键词： Hyperspectral image (HSI) restoration low rankness multimodal tensor factorization subspace learning

来源：评论

学校读者我要写书评

暂无评论

Aligning First, Then Fusing: A novel weakly supervised multimodal violence detection method

引用

Knowledge-Based Systems 2025年 322卷

作者： Jin, Wenping Zhu, Li Sun, Jing School of Software Engineering Xi'an Jiaotong University No. 28 Xianning West Road Shaanxi Xi'an710049 China Shenzhen Institute of Advanced Technology Chinese Academy of Sciences No. 1068 Xueyuan Avenue Shenzhen518055 China

Weakly supervised violence detection refers to the technique of training models to identify violent segments in videos using only video-level labels. Among these approaches, multimodal violence detection, which integrates modalities such as audio and optical flow, holds great potential. Existing methods in this domain primarily focus on designing multimodal fusion models to address modality discrepancies. In contrast, we take a different approach—leveraging the inherent discrepancies across modalities in violence event representation to propose a novel multimodal semantic feature alignment method. This method sparsely maps the semantic features of local, transient, and less informative modalities (such as audio and optical flow) into the more informative RGB semantic feature space. Through an iterative process, the method identifies the suitable non-zero feature matching subspace and aligns the modality-specific event representations based on this subspace, enabling the full exploitation of information from all modalities during the subsequent modality fusion stage. Building on this, we design a new weakly supervised violence detection framework that consists of unimodal multiple-instance learning for extracting unimodal semantic features, multimodal alignment, multimodal fusion, and final detection. Experimental results on benchmark datasets demonstrate the effectiveness of our method, achieving an average precision (AP) of 86.07% on the XD-Violence dataset. Our code is available at https://***/xjpp2016/MAVD. © 2025 Elsevier B.V.

关键词： image segmentation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：