检索结果-内蒙古大学图书馆

MBIAN: multi-level bilateral interactive attention network for multi-modal

EXPERT SYSTEMS WITH APPLICATIONS 2023年第1期231卷

作者： Sun, Kai Zhang, Jiangshe Wang, Jialin Xu, Shuang Zhang, Chunxia Hu, Junying Xi An Jiao Tong Univ Sch Math & Stat Xian 710049 Shaanxi Peoples R China Xi An Jiao Tong Univ Sch Energy & Power Engn Xian 710049 Shaanxi Peoples R China Northwestern Polytech Univ Shenzhen Res & Dev Inst Shenzhen 518063 Guangdong Peoples R China Northwestern Polytech Univ Sch Math & Stat Xian 710072 Shaanxi Peoples R China Northwest Univ Sch Math Xian 710127 Shaanxi Peoples R China

Convolutional neural networks (CNNs) have achieved impressive success in the multi-modal image processing (MIP) area. However, many existing CNN approaches fuse the features of the target and guidance images only once, which may cause a loss of information. To alleviate this problem, we present a multi-level bilateral interactive attention network (MBIAN) to fuse the features of the target and guidance images by their progressive interaction at different levels. Concretely, for each level, a bilateral interactive attention block (BIAB) is proposed to fuse the information of target and guidance images and refine their features. As the core component of our BIAB, a novel bilateral interactive attention layer (BIAL) is designed, where target and guidance images can mutually determine the attention weights. In addition, in each BIAB, long and short local shortcuts are employed to further facilitate the flow of information. Numerical experiments are conducted for three different problems, including panchromatic guided multi-spectral image super-resolution, near-infrared guided RGB image denoising, and flash-guided no-flash image denoising. The results demonstrate the versatility and superiority of MBIAN in terms of quantitative metrics and visual inspection, against 14 popular and state-of-the-art methods.

关键词： multi-modal image processing multi-level bilateral interactive attention network Bilateral interactive attention layer Long and short local shortcuts

来源：评论

学校读者我要写书评

暂无评论

DeepM²CDL: Deep multi-Scale multi-modalConvolutional Dictionary Learning Network

引用

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024年第5期46卷 2770-2787页

作者： Deng, Xin Xu, Jingyi Gao, Fangyuan Sun, Xiancheng Xu, Mai Beihang Univ Sch Cyber Sci & Technol Beijing 100191 Peoples R China Beihang Univ Sch Elect & Informat Engn Beijing 100191 Peoples R China

For multi-modal image processing, network interpretability is essential due to the complicated dependency across modalities. Recently, a promising research direction for interpretable network is to incorporate dictionary learning into deep learning through unfolding strategy. However, the existing multi-modal dictionary learning models are both single-layer and single-scale, which restricts the representation ability. In this paper, we first introduce a multi-scale multi-modal convolutional dictionary learning ((MCDL)-C-2) model, which is performed in a multi-layer strategy, to associate different image modalities in a coarse-to-fine manner. Then, we propose a unified framework namely DeepM(2)CDL derived from the (MCDL)-C-2 model for both multi-modal image restoration (MIR) and multi-modal image fusion (MIF) tasks. The network architecture of DeepM(2)CDL fully matches the optimization steps of the (MCDL)-C-2 model, which makes each network module with good interpretability. Different from handcrafted priors, both the dictionary and sparse feature priors are learned through the network. The performance of the proposed DeepM(2)CDL is evaluated on a wide variety of MIR and MIF tasks, which shows the superiority of it over many state-of-the-art methods both quantitatively and qualitatively. In addition, we also visualize the multi-modal sparse features and dictionary filters learned from the network, which demonstrates the good interpretability of the DeepM(2)CDL network.

关键词： Convolutional dictionary learning interpretable network multi-modal image processing

来源：评论

学校读者我要写书评

暂无评论

Generation of 3D LWIR thermal maps based on deep learning SLAM: feasibility and evaluation 6

Generation of 3D LWIR thermal maps based on deep learning SL...

引用

Conference on Artificial Intelligence and Machine Learning for multi-Domain Operations Applications VI

作者： Kim, Donyung Kim, Sungho Yeungnam Univ Dept Elect Engn Gyongsan Gyeongsangbuk D South Korea

ISBN: (纸本)9781510674219;9781510674202

Machine Learning has played a major role in various applications including Visual Slam and themal image process. In this paper, we discussed the possibility of generating a thermal map using LWIR images and a deep learning-based visual slam network and the value that the thermal map can create. We summarized the advantages and applicability of various deep learning-based visual slams and confirmed the results of nice slam, which generates the most curious Dense map. In order to apply Visual SLAM technology, time series, scene repetition, and images from various angles for one scene are required. However, most LWIR data sets consist of one shot for each scene or are unidirectional driving data. To solve this, we created a scenario using the LWIR driving dataset and created a repetitive route through repetition. RGB-Depth SLAM Mapping was performed on the constructed data set, and the results were evaluated and the limitations of the current approach were discussed. Finally, we summarized future directions for creating stable 3D thermal maps in indoor and outdoor environments by resolving the limitations.

关键词： multi-modal image processing Vis-Lwir SLAM Thermal Map Deep-Learning 3D mapping CNN multispectral

来源：评论

学校读者我要写书评

暂无评论

Potential of metric homotopy between intensity and geometry information for multi-modal 3D registration

引用

ZEITSCHRIFT FUR MEDIZINISCHE PHYSIK 2018年第4期28卷 325-334页

作者： Glodeck, Daniel Hesser, Juergen Zheng, Lei Heidelberg Univ Univ Med Ctr Mannheim Dept Radiat Oncol Expt Radiat Oncol Heidelberg Germany Heidelberg Univ Interdisziplinary Ctr Sci Comp IWR Heidelberg Germany

This paper focuses on a novel strategy increasing robustness with respect to local optima when using Mutual Information (MI) in multi-modal image registration. This is realized by integrating additional geometry information in the cost function. Hereby, the main innovation is a generalization of multi-metric registration approaches by means of a metric homotopy. Particularly we realize a method which automatically determines the parameters of the metric homotopy. To construct the cost function independent of the choice of the optimizer, the weighting is defined as a function of one of the metrics instead of optimizer steps. In addition, a differentiable cost function is developed. In comparison to the commonly used technique to process an intensity based registration on different resolutions, the proposed method is three times faster with unchanged accuracy. It is also shown that in the presence of large landmark errors the proposed method outperforms an approach in accuracy in which both similarity functionals are applied one after the other. The method is evaluated on 3D multi-modal human brain data sets from the Retrospective image Registration Evaluation Project (RIRE). The evaluation is performed using the evaluation website of the RIRE project to make the registration results of the proposed method easily comparable to other methods. Therefore, the presented results are also available online on the RIRE project page.

关键词： image registration multi-modal image processing multi-metric Weighting functions

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：